pull down to refresh

Included in your plan limits until Jun 21, then switch to usage credits to continue.

It also warned when switching that it consumes the session budget 2x as fast.

Going to rerun my past dynamic workflow and see if it finds anything new - and how much of my budget it consumes.

Going to rerun my past dynamic workflow and see if it finds anything new

I let it run with my framework on your wallet PR

CostCost

Ran with fable-5-max, it took exactly 35 minutes (vs 15-20 minutes on Opus), cost me $76.20, making it a little over 4x more expensive than opus-4.8-max, not the 2x advertised, but whatevs. This is probably because it spits out 2x more tokens and at the same time tokens are 2x cost.

Overlap & gapOverlap & gap

Overlap with what Opus found is smaller than I expected, only 5 items (but thanks the heavens for no more crap severity indicators) of which the one they rank as priority being the least interesting.

4 new things it thinks important, 10 it thinks trivial. It definitely does look deeper than Opus, but I'm missing some things. Either it didn't see it (that be bad), it thought it wasn't important (that be bad too because it is instructed not to), it triggered the censorship (that means need to dual-run with Opus), or it limits results (most likely with all the other crap I have seen the past few weeks.)

False positivesFalse positives

None of the things it found were known false positives from the runs I did with Opus. I now have to validate each one, save 2 that I know are there but nits.

next upnext up

I'll prioritize verifying the 4 issues it thinks to be important and file comments on the PR if they are valid.

reply
84 sats \ 1 reply \ @sox OP 9 Jun

76 dollars?? I only got up to 15% of 20x.

I’ll try with something heavier than normal usage.

reply

yes. it sounds even worse in sats

reply
176 sats \ 1 reply \ @sox OP 9 Jun

The Max limits are very steep, but I need to stress-test it.

So far? Really good. It managed to come up with a really nice fix to a gnarly architectural constraint in one shot, didn't have to reprompt.

reply

I maxxed out max 20x with a single task 😂

reply