pull down to refresh

There are many ways I can think of to implement this and each results in different set of input tokens and tradeoffs. So I figure this feature is highly opinionated in how it's implemented -- is probably why it hasn't been released yet..

How you would represent this concept of "reply-to" as input tokens?

Transformers work by assigning a relationship-value from every token to every other token. In a normal chat interface, you could just reference the message ID and render it differently in the thread (as a link to the original message) to represent a reply.

But passing a message ID/link as input tokens to the model would waste its attention on trying to decipher the message ID.

You could copy-paste the entire paragraph you're referencing as input to the model. But then you're wasting context because that paragraph already exists in the context.

This is why something like "re: subject of paragraph" is efficient. It uses few tokens, does not use any additional ID/linking system. But it does require you to point out the subject yourself and as you say, it not precise enough.