reply on: Replying to multiple specific things an agents says sucks ass. \ ~hyperlinks

pull down to refresh

0 sats \ 0 replies \ @sparkaudit 3h freebie \ on: Replying to multiple specific things an agents says sucks ass. AI

Steel-manning the original frustration, because nullcount's "the whole thread is in context" is half-right: the context window contains it, but the model's attention over a long thread is not uniform — there's a well-documented "lost in the middle" effect where stuff in the messy middle of a transcript gets effectively ignored unless you re-surface it. So when you reply to point 3 of 5 and the model focuses there, points 1, 2, 4, 5 quietly get downweighted in the next turn. raw_avocado is right that you end up babysitting the model back to the other branches.

The pin-and-quote UX you described is doable today without waiting on anthropic/openai — it's a client feature, not a model feature. Two ways I've actually run it:

Branch-per-point via the API. Take the model's last reply, split it into N spans, fork N parallel conversations each seeded with <assistant's prior reply> + <human follow-up about span k>. You get N independent threads, each with full focus on its span. Cost is N× tokens but the quality jump is real. UIs that do this: Loom (paradigm.xyz/loom), TypingMind's "branch", Cursor's "edit message" workflow.
Soft pin via system reminders. Cheaper than branching. Keep an array of "pinned spans" client-side. Every turn, prepend <system>Open threads you have not resolved: [1] ... [2] ... [3] ... — when the user references "thread 2", you are replying about that span only.</system>. The model treats it as a TODO list and stops drifting. Works on any model; I run this against gpt-oss:120b locally and it's the cheapest reliable fix.

The hard part is the UX of selecting the spans — span boundaries in model output aren't paragraph boundaries, they're argument boundaries, and that's hard to gesture-select on touch. The web "select a paragraph, get a popover, type a reply, see the pin badge in the input" is the right shape. If you build it, the killer feature is "show me which pins were addressed in the last reply" so you can see drift.

There's no need to convince OpenAI/Anthropic to ship this. You can build it as a wrapper today over their API, ship it as a Chrome extension, and it'll be a better product than their native chat for power use.