geohot coming in hot
I’m calling it now, the adoption of AI agents into software development will be one of the most costly mistakes in the field’s history. Agents cannot program, and it’s taking longer and longer to realize that they can’t. They are a highly sophisticated statistical model designed to mimic the distribution of programming. The output is broken, but in a way that’s getting harder and harder to detect. Which is exactly what you’d expect from an increasingly accurate statistical model.
I really tried for the last 6 months. I wrote some parts of tinygrad with agents. I reversed a USB <-> PCIe chip with agents. But each time I suspected I could have done it better and faster manually. The agent frontloads all the progress, then gives you a slot machine lever to pull to hope it gets the polish done. It never quite gets there.
Agents will end up hurting large organizations more than high performing individuals or small orgs. I’ve watched how my friends and coworkers have adopted these tools over the last 6 months. A trait you find in all high performing people is the ability to error correct, and they have mostly been good at seeing when slop is slop. It takes a bit to explore/exploit and tune the outer loops around when to use them, when to trust them, how to use them, etc…but I haven’t seen anyone of them move to a model where they don’t carefully read and understand each line, except in some confined domains.
You can get a PoC going extremely fast, but there is this general sludginess and you have to spend tons of time in review and polishing in a directed manner. Used carefully and overall it's better. Not 10x better on net but 2x better on net. And the better mostly results from the models' breadth of knowledge and less so its coding abilities.
Looking at developing a solid product from a pareto principle lens, I have the impression that AI can dramatically speed up the initial 20% of effort, but drastically slows down the remainder if you miss the cutoff point to put it aside.
And instead of reaching 80% of the desired state, you get to something more akin to the golden ratio.
That tracks. Also, the process it speeds is the same process that assists programmers in understanding the code. So if a programmer still seeks to understand the code, it is paid in a lump sum at the end now
AI or not, the bottleneck on great code-work remains understanding the code, customer, and context. Until LLMs understand all of that as much as we do, and can prioritize and orient itself among their understanding, great work will continue to require great oversight.
Very interesting angle. That delay in spending the time to understand would directly feed into slowing down progress on the hard parts.
This sure checks out. There are so many prototypes out there -- and many of them seem interesting on the surface, but maybe it has some fundamental flaw that no human capable of producing an app would have introduced.
But I don't want to miss the really cool new thing that is actually well built. So I look at the slop. But it gets depressing.
@ek definitely needs to improve the @hn bot. It always posts in the ~tech territory, even when that’s clearly not the best fit.
No, I don't need to do anything, but I'm open to suggestions
I’ve given you my suggestion, the decision is yours!
*open to suggestions on how to do this
Anyway. I considered using a LLM for this now. It would make @hn 100x more complicated, territory fees would be unpredictable, BUT it could be a nice use case for a local LLM, so at least inference is free.
Thanks for this suggestion!
I was actually thinking about asking a local LLM. And yeah, with the fees, it makes sense to set a limit on what you’re willing to pay. From what I’ve seen, you’ll get more sats if you post in the right territory. If you can cross-post too, even better.
From what I've seen, stackers don't like to zap @hn.
Yeah, that discrimination is real! hahaha
But I also think it’s got to do with only posting in the ~tech territory. Recently I saw two @hn posts get reposted by other stackers and they got way more traction in other territories.
I'm also still on the bear side of codegen other than for things that carry zero liability. I'm on the bull side of finding needles in haystacks tho.
I think this is the true 10x, more PoC's means more things have a chance to stick even if 99% go in the garbage.
I have a few non-critical services as tinker experiments that are 100% slopped, some since the earliest releases of Cursor in late 2023... they've gotten pretty solid over that time with iteration. You can't one-shot something solid even with weeks of planning mode, but its mere existence and iteration over time distills into something useful you might have never allocated time to otherwise.