pull down to refresh
There are many ways I can think of to implement this and each results in different set of input tokens and tradeoffs. So I figure this feature is highly opinionated in how it's implemented -- is probably why it hasn't been released yet..
How you would represent this concept of "reply-to" as input tokens?
Transformers work by assigning a relationship-value from every token to every other token. In a normal chat interface, you could just reference the message ID and render it differently in the thread (as a link to the original message) to represent a reply.
But passing a message ID/link as input tokens to the model would waste its attention on trying to decipher the message ID.
You could copy-paste the entire paragraph you're referencing as input to the model. But then you're wasting context because that paragraph already exists in the context.
This is why something like "re: subject of paragraph" is efficient. It uses few tokens, does not use any additional ID/linking system. But it does require you to point out the subject yourself and as you say, it not precise enough.
The model does forget(halucinate), because every time you send a new prompt the full conversation gets sent again, and as the context grows, it's not as good at remembring.
But that's not the problem, the problem is I want to reference certain thinks, like I want us to talk about the folder structure you propose for this thing, and i want to say this part is good, this is not good, this is also good.
Yes, it is veyr much a UX thing.
re: <subject of paragraph> - its not precise enough some times, when you reference it free form it allows more room for interpretation from the model.