This reminds me so much of the late-80s when everyone was installing PCs in their offices and everyone was asking if this is actually better than a typewriter and Rolex, because people spend all day “futzing” on the computer.
5 years later we had networking, emerging interoperability standards, office productivity suites. 10 years later there was basically no company left that didn’t have PCs and much better productivity.
I see the same thing playing out here. A year ago we had Copilot and it sucked, I didn’t see the utility. But now coding agents with skills can easily read and understand specs, create testsuites, etc. These are right now revolutionizing my team’s work.
You see this pattern over & over with AI capability on a given task: It’s pathetic at 5%, then it merely sucks at 40%, then it takes a lot of futzing to fix up at 70%, then suddenly it’s at 95% and does as well as most professionals.
Downvote me to hell, this is my honest assessment.
There’s no such thing as “agents”, there must always be a human in the loop, they don’t just create code from nothing. Both in the sense of a human needing to prompt the LLM, and in the fact that they’re trained on human created code. “Agents” is just a buzzword made by tech CEOs and MBAs to make the general population think they’re doing more than they really are. They have no skills, they’re a statistical prediction model. And prediction models tend to fuck up a lot of things, especially as the data window grows.
You can use them to help code, yes, but don’t do what the billionaire class wants and make them seem like more than they really are.
Up until ~6 months ago I would have agreed with you, and elaborated that “Agents are just LLMs in a loop with a text file scratchpad”
That’s… still true in a way, but honestly so many people have put so much cleverness into managing that process, that I have to say, yes, Cline or Codex with GPT or Claude Code behind them are absolutely “agentic”.
I can point them to a problem report and our company documentation and… an ever-increasing percentage of the time, I wind up with a problem description, a patch that fixes it, unit, coverage, and stress tests, and (if relevant) updated docs.
Yup, everyone (including me) strongly wants to deny the usefulness of ai, but the fact is that ai is already quite useful, and is only becoming more useful over time. There are a zillion moral problems with ai, but the usefulness of its output is obvious.
E.g. For years I’ve been considering paying someone to make a small app for me that does one specific thing, but recently i asked ai to do it and boom - it created an app that did exactly what i wanted. It even suggested some good features which i then said yes to and it made the app even better. And when i think of a new feature i just say “add this new feature”, and it does. Occasionally the outputted app doesn’t work, and i just say “now the audio doesn’t work, fix it” and it does. So far there was only one one feature i asked it to do that it failed at.
Is your app as efficient as what an experienced developer would create? If you released the source code, would it have security vulnerabilities? These are just a couple of the more hidden issues that fly under the radar when shipping LLM-generated code.
Is your app as efficient as what an experienced developer would create?
One of the earliest uses we had for LLMs was literally just asking it to optimize several large codebases. Lots of pointless changes suggested; several huge performance wins we had overlooked.
And all done – implemented, tested, and human-reviewed – in about a person-week, compared to at least half a dozen person-months to go through all that by hand.
I mean, sometimes the LLMs generate slow algos. But less often than human coders.
If you released the source code, would it have security vulnerabilities?
You’re not gonna believe this, but another of the first things we did was ask the LLMs to review the codebase for security issues (and review any new PRs)
OFC the code also gets reviewed for security vulns like it always has, by old-school automation (eg valgrind, fortify, yadda), human review, and red-teaming exercises. I don’t think I’ve seen enough data yet to say whether it’s got more/worse security issues than human-generated code (which, need I remind you, is often highly insecure)
These are just a couple of the more hidden issues that fly under the radar when shipping LLM-generated code.
Ummm… those would be issues if you didn’t use good orchestration, didn’t have good tools and docs for the LLMs to use, didn’t have follow good software engineering practices to begin with…
This reminds me so much of the late-80s when everyone was installing PCs in their offices and everyone was asking if this is actually better than a typewriter and Rolex, because people spend all day “futzing” on the computer.
5 years later we had networking, emerging interoperability standards, office productivity suites. 10 years later there was basically no company left that didn’t have PCs and much better productivity.
I see the same thing playing out here. A year ago we had Copilot and it sucked, I didn’t see the utility. But now coding agents with skills can easily read and understand specs, create testsuites, etc. These are right now revolutionizing my team’s work.
You see this pattern over & over with AI capability on a given task: It’s pathetic at 5%, then it merely sucks at 40%, then it takes a lot of futzing to fix up at 70%, then suddenly it’s at 95% and does as well as most professionals.
Downvote me to hell, this is my honest assessment.
and it is dead wrong lol
What “skills”?
If they had the “skills” to do the basic functions of their jobs, they wouldn’t need “skills” in reference to AI…
Which means when the AI inevitably fucks up, the only way they can fix it is by asking the AI to fix it repeatedly and hope it eventually works…
Sounds like your “team” should spend less on AI and more on qualified employees.
Or do you all use free chatbots?
There’s no such thing as “agents”, there must always be a human in the loop, they don’t just create code from nothing. Both in the sense of a human needing to prompt the LLM, and in the fact that they’re trained on human created code. “Agents” is just a buzzword made by tech CEOs and MBAs to make the general population think they’re doing more than they really are. They have no skills, they’re a statistical prediction model. And prediction models tend to fuck up a lot of things, especially as the data window grows.
You can use them to help code, yes, but don’t do what the billionaire class wants and make them seem like more than they really are.
Up until ~6 months ago I would have agreed with you, and elaborated that “Agents are just LLMs in a loop with a text file scratchpad”
That’s… still true in a way, but honestly so many people have put so much cleverness into managing that process, that I have to say, yes, Cline or Codex with GPT or Claude Code behind them are absolutely “agentic”.
I can point them to a problem report and our company documentation and… an ever-increasing percentage of the time, I wind up with a problem description, a patch that fixes it, unit, coverage, and stress tests, and (if relevant) updated docs.
Yup, everyone (including me) strongly wants to deny the usefulness of ai, but the fact is that ai is already quite useful, and is only becoming more useful over time. There are a zillion moral problems with ai, but the usefulness of its output is obvious.
E.g. For years I’ve been considering paying someone to make a small app for me that does one specific thing, but recently i asked ai to do it and boom - it created an app that did exactly what i wanted. It even suggested some good features which i then said yes to and it made the app even better. And when i think of a new feature i just say “add this new feature”, and it does. Occasionally the outputted app doesn’t work, and i just say “now the audio doesn’t work, fix it” and it does. So far there was only one one feature i asked it to do that it failed at.
Is your app as efficient as what an experienced developer would create? If you released the source code, would it have security vulnerabilities? These are just a couple of the more hidden issues that fly under the radar when shipping LLM-generated code.
One of the earliest uses we had for LLMs was literally just asking it to optimize several large codebases. Lots of pointless changes suggested; several huge performance wins we had overlooked.
And all done – implemented, tested, and human-reviewed – in about a person-week, compared to at least half a dozen person-months to go through all that by hand.
I mean, sometimes the LLMs generate slow algos. But less often than human coders.
You’re not gonna believe this, but another of the first things we did was ask the LLMs to review the codebase for security issues (and review any new PRs)
OFC the code also gets reviewed for security vulns like it always has, by old-school automation (eg valgrind, fortify, yadda), human review, and red-teaming exercises. I don’t think I’ve seen enough data yet to say whether it’s got more/worse security issues than human-generated code (which, need I remind you, is often highly insecure)
Quite possibly solving the majority of human diseases is rather more than “quite useful”
2024 Nobel Prize lecture 2024 https://www.youtube.com/watch?v=qX1aYUckvnY
2025 lecture: Deep Protein Space. If this doesn’t blow your fucking mind… you haven’t heard of DNA https://www.youtube.com/watch?v=_enkgH6Vrxk