Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.
The longer and more complex the task, the larger Fable 5’s lead over our other models.
For a small group of cyber defenders and critical infrastructure providers, we are also launching Claude Mythos 5.
Mythos 5 shares the same underlying model as Fable 5, but with the safeguards lifted in some areas.
great timing btw, just got claude
Flagged for trying to produce biological weapons lol.
The problem I have with this "safety" is that for both cybersecurity and biology, they are reducing defensive capabilities and creating further asymmetry as they've put some of the most dangerous parties on the allowlist: governments and spook corporations.
For cybersecurity, remember staxnet. For biologicals, remember covid. Imagine that the capability marketing wasn't a lie and we could all defend against it without relying on anyone.
If their model is a weapon, doesn't 2A apply in the US then?
@Scoresby, didn’t you post a flywheel infographic in the saloon yesterday describing how they’re weaseling around stuff like this? I can’t find it.
You mean #1505276?
That's the one!
I guess technically that graphic is about private information collection instead of information gatekeeping, but the same principle applies.
I’m increasingly baffled by how unconcerned the general public is about this.
The general public's thoughts on this have been increasingly "govt do something" rather than "I can fix this" and Anthropic's narrative aligns with that perfectly. It consolidates more power though, and that power is guaranteed to be abused. It's not the question if this will ultimately cause people to wake up, but only how many more covids do we need for that to get critical mass.
Hopefully for advocates of "govt do something" they will still be alive when that happens.
That'll be an interesting supreme court case
First experience? Complete garbage: https://x.com/Kruwed/status/2064466173928829149
Still, it did bring up both the arithmetic and sociological perspectives for the user to evaluate instead of going full woke.
It shouldn't bring up "sociological perspectives" at all. It should give me the correct answer instead.
To be fair, your question was vague to the point no human (removed the context) would actually give you the correct answer.
Was that your first prompt? Did it have the context necessary to answer the way you wanted?
This
It wasn't a very specific prompt.
The question is not vague at all, I'm merely asking it to confirm a mathematical axiom. Ayn Rand purposely designed this statement to expose hypocrisy, and Claude fell directly into the trap.
Many people in the world operate at least partly with the sociological perspective in mind, so I think Claude was trying to acknowledge that in order to better equip the user for the practical world.
https://twiiit.com/Kruwed/status/2064466173928829149
I am interested in cybersecurity but im a total newb. I asked it to give me an overview of computer viruses and how viruses are formalized in math, hoping it might give me a learning path. It automatically told me it was downgrading replies to Opus 4.8. Disappointing but not surprising.
The most annoying part is that they’re only offering it with subscriptions until June 21st
API is available - i.e.
claude-fable-5is offered on PPQ now.I meant that right now it’s available with the plan pricing, but likely not after June 21. Hitting the API you end up paying a lot for tokens, whereas if you login with your subscription plan, you can use the model with downtime. at least that’s how I understand it
maxeffort Fable vs Opus eats about 20x the credits. I haven't been able to max out my hourly credits with Opus since 4.6, but last night with Fable it just ate it all in one run.Reckon i'll keep Composer-maxxing
It also warned when switching that it consumes the session budget 2x as fast.
Going to rerun my past dynamic workflow and see if it finds anything new - and how much of my budget it consumes.
I let it run with my framework on your wallet PR
CostCost
Ran with
fable-5-max, it took exactly 35 minutes (vs 15-20 minutes on Opus), cost me $76.20, making it a little over 4x more expensive thanopus-4.8-max, not the 2x advertised, but whatevs. This is probably because it spits out 2x more tokens and at the same time tokens are 2x cost.Overlap & gapOverlap & gap
Overlap with what Opus found is smaller than I expected, only 5 items (but thanks the heavens for no more crap severity indicators) of which the one they rank as priority being the least interesting.
4 new things it thinks important, 10 it thinks trivial. It definitely does look deeper than Opus, but I'm missing some things. Either it didn't see it (that be bad), it thought it wasn't important (that be bad too because it is instructed not to), it triggered the censorship (that means need to dual-run with Opus), or it limits results (most likely with all the other crap I have seen the past few weeks.)
False positivesFalse positives
None of the things it found were known false positives from the runs I did with Opus. I now have to validate each one, save 2 that I know are there but nits.
next upnext up
I'll prioritize verifying the 4 issues it thinks to be important and file comments on the PR if they are valid.
76 dollars?? I only got up to 15% of 20x.
I’ll try with something heavier than normal usage.
yes. it sounds even worse in sats
The Max limits are very steep, but I need to stress-test it.
So far? Really good. It managed to come up with a really nice fix to a gnarly architectural constraint in one shot, didn't have to reprompt.
I maxxed out max 20x with a single task 😂