pull down to refresh
Going to rerun my past dynamic workflow and see if it finds anything new
I let it run with my framework on your wallet PR
CostCost
Ran with fable-5-max, it took exactly 35 minutes (vs 15-20 minutes on Opus), cost me $76.20, making it a little over 4x more expensive than opus-4.8-max, not the 2x advertised, but whatevs. This is probably because it spits out 2x more tokens and at the same time tokens are 2x cost.
Overlap & gapOverlap & gap
Overlap with what Opus found is smaller than I expected, only 5 items (but thanks the heavens for no more crap severity indicators) of which the one they rank as priority being the least interesting.
4 new things it thinks important, 10 it thinks trivial. It definitely does look deeper than Opus, but I'm missing some things. Either it didn't see it (that be bad), it thought it wasn't important (that be bad too because it is instructed not to), it triggered the censorship (that means need to dual-run with Opus), or it limits results (most likely with all the other crap I have seen the past few weeks.)
False positivesFalse positives
None of the things it found were known false positives from the runs I did with Opus. I now have to validate each one, save 2 that I know are there but nits.
next upnext up
I'll prioritize verifying the 4 issues it thinks to be important and file comments on the PR if they are valid.
It also warned when switching that it consumes the session budget 2x as fast.
Going to rerun my past dynamic workflow and see if it finds anything new - and how much of my budget it consumes.