…and I still don’t get it. I paid for a month of Pro to try it out, and it is consistently and confidently producing subtly broken junk. I had tried doing this before in the past, but gave up because it didn’t work well. I thought that maybe this time it would be far along enough to be useful.
The task was relatively simple, and it involved doing some 3d math. The solutions it generated were almost write every time, but critically broken in subtle ways, and any attempt to fix the problems would either introduce new bugs, or regress with old bugs.
I spent nearly the whole day yesterday going back and forth with it, and felt like I was in a mental fog. It wasn’t until I had a full night’s sleep and reviewed the chat log this morning until I realized how much I was going in circles. I tried prompting a bit more today, but stopped when it kept doing the same crap.
The worst part of this is that, through out all of this, Claude was confidently responding. When I said there was a bug, it would “fix” the bug, and provide a confident explanation of what was wrong… Except it was clearly bullshit because it didn’t work.
I still want to keep an open mind. Is anyone having success with these tools? Is there a special way to prompt it? Would I get better results during certain hours of the day?
For reference, I used Opus 4.6 Extended.


I’ve never been able to program in anything more complex than BASIC and command line batch files, but I’m able to get useful output from Claude.
I’m an IT Infrastructure Manager by trade, and I got there through 20 years of supporting everything from desktop to datacenter including weird use cases like controlling systems in a research lab. On top of that, I’ve gotten under the hood of software in the form of running game servers in my spare time.
What you need to get good programs out of AI boils down to 3 things:
I’ve made tools which automate and improve my entire department’s approach to user data, device data, application inventory, patch management, vulnerability management, and these are changes I started making with a free product three months ago, and two months back I switched to the paid version.
Programming is sort of like conversation in an alien language. For that reason, if you can give precise instructions sometimes you really can pull something new into existence using LLM coding. It’s the same reason that you could say words which have never been said in that specific order before, and have an LLM translate them to Portuguese.
I always used to talk about how everything in a computer was math, and that what interested me more than quantum computing would be a machine which starts performing the same sorts of operations on words or concepts that computers of that day ('90s and '00s when “quantum” was being slapped on everything to mean “fast” or “powerful”) were doing on math. I said that the best indicator when linguistic computing arrives would be that without ever learning to program, I’d start being able to program. I was looking at “Dragon Naturally Speaking” when I had this idea. It was one of the earliest effective speech to text programs. I stopped learning to program immediately and focused exclusively on learning operations from that point forward.
I’ve been testing the code generation abilities of LLMs for about three years. Within the last six months I feel like I’m starting to see evidence that the associations being made internally by LLMs are complex enough to begin considering them the fulfillment of my childhood dream of a “word computer”.
All the shitty stuff about environment and theft of art is all there too, which sucks, but more because our economic model sucks than because LLMs either do or do not suck. If we had a framework for meeting everybody’s basic needs, this software in its current state has the potential to turn everyone with a passion for grammatical and technical precision into a concept based developer practically overnight.
Do you seriously not realize how out of touch this sounds?
Chatbots being deemed useful in tasks by people unqualified to make those judgments is a running problem.