pull down to refresh
Welp... nerf'd, but looks like it's done on the harness side. Re-running a task:
this is 4.8 with new claude code from today
$ forgejo.sh issue comment-get 4788 | jq -r '.body' | wc -w
1604this is 4.8 with old claude code from last week
$ forgejo.sh issue comment-get 4707 | jq -r '.body' | wc -w
5044reply
Shouldn’t you be testing with the same inputs? 4788 ≠ 4707
reply
Switching from high to xhigh gives me back the old format and some more output, this is because high now apparently means retarded, lol. This is xhigh:
$ forgejo.sh issue comment-get 4794 | jq -r '.body' | wc -w
2405But what was high on 4.6/4.7 is now max:
$ forgejo.sh issue comment-get 4798 | jq -r '.body' | wc -w
2823max takes about the same time as high used to do, has the same findings. The lower verbosity is acceptable. Respects formatting. Does not take (as many) shortcuts. But I still fear they're breaking things.
This explains the regressions. It's like shrinkflation but on LLM plan credits.
Crap, I just watched prompt-poisoning on Claude Opus 4.8 - that hasn't happened with 4.6 and 4.7. More regressions? Or just unlucky?