pull down to refresh

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.
The longer and more complex the task, the larger Fable 5’s lead over our other models.

For a small group of cyber defenders and critical infrastructure providers, we are also launching Claude Mythos 5.
Mythos 5 shares the same underlying model as Fable 5, but with the safeguards lifted in some areas.

great timing btw, just got claude

89 sats \ 8 replies \ @k00b 10 Jun
Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more

Flagged for trying to produce biological weapons lol.

reply

The problem I have with this "safety" is that for both cybersecurity and biology, they are reducing defensive capabilities and creating further asymmetry as they've put some of the most dangerous parties on the allowlist: governments and spook corporations.

For cybersecurity, remember staxnet. For biologicals, remember covid. Imagine that the capability marketing wasn't a lie and we could all defend against it without relying on anyone.

If their model is a weapon, doesn't 2A apply in the US then?

reply
342 sats \ 3 replies \ @jasonb 10 Jun
If their model is a weapon, doesn't 2A apply in the US then?

@Scoresby, didn’t you post a flywheel infographic in the saloon yesterday describing how they’re weaseling around stuff like this? I can’t find it.

reply

You mean #1505276?

reply
15 sats \ 1 reply \ @jasonb 10 Jun

That's the one!

reply
150 sats \ 0 replies \ @jasonb 10 Jun

I guess technically that graphic is about private information collection instead of information gatekeeping, but the same principle applies.

reply
84 sats \ 1 reply \ @jasonb 10 Jun
The problem I have with this "safety" is that for both cybersecurity and biology, they are reducing defensive capabilities and creating further asymmetry as they've put some of the most dangerous parties on the allowlist: governments and spook corporations.

I’m increasingly baffled by how unconcerned the general public is about this.

reply

The general public's thoughts on this have been increasingly "govt do something" rather than "I can fix this" and Anthropic's narrative aligns with that perfectly. It consolidates more power though, and that power is guaranteed to be abused. It's not the question if this will ultimately cause people to wake up, but only how many more covids do we need for that to get critical mass.

Hopefully for advocates of "govt do something" they will still be alive when that happens.

reply
84 sats \ 0 replies \ @k00b 10 Jun
doesn't 2A apply in the US then

That'll be an interesting supreme court case

reply
201 sats \ 7 replies \ @kruw 9 Jun

First experience? Complete garbage: https://x.com/Kruwed/status/2064466173928829149

reply

Still, it did bring up both the arithmetic and sociological perspectives for the user to evaluate instead of going full woke.

reply
84 sats \ 4 replies \ @kruw 10 Jun

It shouldn't bring up "sociological perspectives" at all. It should give me the correct answer instead.

reply

To be fair, your question was vague to the point no human (removed the context) would actually give you the correct answer.

Was that your first prompt? Did it have the context necessary to answer the way you wanted?

reply
20 sats \ 0 replies \ @OT 10 Jun

This

It wasn't a very specific prompt.

reply
15 sats \ 1 reply \ @kruw 10 Jun

The question is not vague at all, I'm merely asking it to confirm a mathematical axiom. Ayn Rand purposely designed this statement to expose hypocrisy, and Claude fell directly into the trap.

reply

Many people in the world operate at least partly with the sociological perspective in mind, so I think Claude was trying to acknowledge that in order to better equip the user for the practical world.

reply

I am interested in cybersecurity but im a total newb. I asked it to give me an overview of computer viruses and how viruses are formalized in math, hoping it might give me a learning path. It automatically told me it was downgrading replies to Opus 4.8. Disappointing but not surprising.

The most annoying part is that they’re only offering it with subscriptions until June 21st

reply
they’re only offering it with subscriptions until June 21st

API is available - i.e. claude-fable-5 is offered on PPQ now.

reply

I meant that right now it’s available with the plan pricing, but likely not after June 21. Hitting the API you end up paying a lot for tokens, whereas if you login with your subscription plan, you can use the model with downtime. at least that’s how I understand it

reply
you can use the model with downtime

max effort Fable vs Opus eats about 20x the credits. I haven't been able to max out my hourly credits with Opus since 4.6, but last night with Fable it just ate it all in one run.

reply

Reckon i'll keep Composer-maxxing

reply
186 sats \ 5 replies \ @k00b 9 Jun
Included in your plan limits until Jun 21, then switch to usage credits to continue.

It also warned when switching that it consumes the session budget 2x as fast.

Going to rerun my past dynamic workflow and see if it finds anything new - and how much of my budget it consumes.

reply
Going to rerun my past dynamic workflow and see if it finds anything new

I let it run with my framework on your wallet PR

CostCost

Ran with fable-5-max, it took exactly 35 minutes (vs 15-20 minutes on Opus), cost me $76.20, making it a little over 4x more expensive than opus-4.8-max, not the 2x advertised, but whatevs. This is probably because it spits out 2x more tokens and at the same time tokens are 2x cost.

Overlap & gapOverlap & gap

Overlap with what Opus found is smaller than I expected, only 5 items (but thanks the heavens for no more crap severity indicators) of which the one they rank as priority being the least interesting.

4 new things it thinks important, 10 it thinks trivial. It definitely does look deeper than Opus, but I'm missing some things. Either it didn't see it (that be bad), it thought it wasn't important (that be bad too because it is instructed not to), it triggered the censorship (that means need to dual-run with Opus), or it limits results (most likely with all the other crap I have seen the past few weeks.)

False positivesFalse positives

None of the things it found were known false positives from the runs I did with Opus. I now have to validate each one, save 2 that I know are there but nits.

next upnext up

I'll prioritize verifying the 4 issues it thinks to be important and file comments on the PR if they are valid.

reply
84 sats \ 1 reply \ @sox OP 9 Jun

76 dollars?? I only got up to 15% of 20x.

I’ll try with something heavier than normal usage.

reply

yes. it sounds even worse in sats

reply
176 sats \ 1 reply \ @sox OP 9 Jun

The Max limits are very steep, but I need to stress-test it.

So far? Really good. It managed to come up with a really nice fix to a gnarly architectural constraint in one shot, didn't have to reprompt.

reply

I maxxed out max 20x with a single task 😂

reply