pull down to refresh
Fair enough.
- Non-professionals may get lucky
I probably have been interacting with too many early-career academics who fall into this category, without the actual luck factor, dismissing the actual benefits I've been experiencing in the code I've moved with a few well-targeted prompts.
Adversarially?
Oh no, not at all. It was an acknowledgment of my unfair instinctive habit of being dismissive of even incremental improvements, forgetting that in the big picture, these incremental improvements have been stacking up quite well into consequential improvements since the first version of ChatGPT came out. In the same line of thought, I should acknowledge that in the context of security, the latest LLMs are likely quite powerful.
to dismiss sloppy findings.
Yeah, that's what the early career students I work with are often unable to do
Because you need some pretty good sequence of prompts to get to validate/poc a real vuln
caveat noted
caveat noted
Caveat to the caveat is of course that you can ask the LLM to help you build the framework.
For example, I have a set of standard review instructions now for LLMs that do the preparative work I used to do manually when new releases of critical android software comes out (think: GrapheneOS itself, Signal, your LN wallet, pass/key management apps, your FOSS keyboard). I started with describing my manual process for different tech stacks (kotlin, dart+rust+bridges, react+bridges, etc) but I've also used "retro" rounds to improve specific instructions for each app, and there the LLM proposes me improvements to the prompts. Many are good (though I also get a lot of "this doesn't apply", which I edit out because it still needs to be checked in case it does apply in the future.)
It doesn't really matter. The danger has been for over a year:
Any increment in capabilities to any actor that is either a blackhat or an a-hole on a luck streak is an increment in danger.
At this moment with Mythos unreleased, the greatest immediate threat is that Opus 4.7 and GPT 5.5 got mainlined into subscription plans. Those are increments, yet if you ran your own adversarial analysis, you have to do it again, because the baseline changed. And pray your framework holds up (and no, plain installs of trail-of-bits skills are not holding up)