pull down to refresh

Crap, I just watched prompt-poisoning on Claude Opus 4.8 - that hasn't happened with 4.6 and 4.7. More regressions? Or just unlucky?

Welp... nerf'd, but looks like it's done on the harness side. Re-running a task:

this is 4.8 with new claude code from today

$ forgejo.sh issue comment-get 4788 | jq -r '.body' | wc -w 
1604

this is 4.8 with old claude code from last week

$ forgejo.sh issue comment-get 4707 | jq -r '.body' | wc -w 
5044
reply

Shouldn’t you be testing with the same inputs? 4788 ≠ 4707

reply

Switching from high to xhigh gives me back the old format and some more output, this is because high now apparently means retarded, lol. This is xhigh:

$ forgejo.sh issue comment-get 4794 | jq -r '.body' | wc -w
2405

But what was high on 4.6/4.7 is now max:

$ forgejo.sh issue comment-get 4798 | jq -r '.body' | wc -w
2823

max takes about the same time as high used to do, has the same findings. The lower verbosity is acceptable. Respects formatting. Does not take (as many) shortcuts. But I still fear they're breaking things.

This explains the regressions. It's like shrinkflation but on LLM plan credits.

reply

I’ll just act like I understood everything! ~lol

reply

TLDR: Anthropic nerf'd "high effort" to now mean "no effort", and "max effort" to mean "high effort". They probably also nerf'd max.

reply

Those are the answers to the exact same assignment. 2 times the same issue.

reply