pull down to refresh

I kept Opus as the default (and was playing with the thought of hardcoding 4.7 selection to get rid of some of the 4.8 regressions).

Fable cost is prohibitive, and it has more false positives for me than Opus 4.7. It feels a bit like the same outcome from a year ago where 4.x was having a ton of regressions vs 3.7 (though this may not be as steep.) Maybe a 5.5 in 4-6 months will bring the real improvement, like 4.5 did versus 4.1.

187 sats \ 3 replies \ @k00b 11h

It communicated in better detail than Opus. That's what I miss.

reply
136 sats \ 2 replies \ @optimism 10h

Do you have the replies saved? Opus on xhigh or up can be instructed more easily to adjust output (and less default template triggering)

reply
85 sats \ 1 reply \ @k00b 10h

I assume they're in the logs. I'm still not at your levels of wizardry yet. I'm still using off-the-shelf harness/leashes/skills for the most part.

reply
104 sats \ 0 replies \ @optimism 10h

You can often tell it to do things you want in your prompt. Even just as a sentence it works 99.99% of the time (except on <=high effort, then it will just suck.)

reply
169 sats \ 3 replies \ @sox 20h

I got blessed by limits probably, stopped at 15% of my weekly limit and I've been using it a lot.
Sad that it went away, it could do much more in 5 minutes than opus.

reply
116 sats \ 2 replies \ @optimism 20h

Are you using it interactively?

reply
85 sats \ 1 reply \ @sox 20h

I had it map the stacker news' codebase with n agents in parallel, some telegram bots and all-day interactively with its vscode extension.
The curious part was that the vscode extension wasn't participating in the limits at all, they would only go up with claude code cli.

reply
116 sats \ 0 replies \ @optimism 20h

Must be I call claude -p and they penalize me for that? Or using xhigh/max? Not sure. Either way, moot point now, lol. I'm hapy I didn't tune my framework to it.

reply