…and I still don’t get it. I paid for a month of Pro to try it out, and it is consistently and confidently producing subtly broken junk. I had tried doing this before in the past, but gave up because it didn’t work well. I thought that maybe this time it would be far along enough to be useful.

The task was relatively simple, and it involved doing some 3d math. The solutions it generated were almost write every time, but critically broken in subtle ways, and any attempt to fix the problems would either introduce new bugs, or regress with old bugs.

I spent nearly the whole day yesterday going back and forth with it, and felt like I was in a mental fog. It wasn’t until I had a full night’s sleep and reviewed the chat log this morning until I realized how much I was going in circles. I tried prompting a bit more today, but stopped when it kept doing the same crap.

The worst part of this is that, through out all of this, Claude was confidently responding. When I said there was a bug, it would “fix” the bug, and provide a confident explanation of what was wrong… Except it was clearly bullshit because it didn’t work.

I still want to keep an open mind. Is anyone having success with these tools? Is there a special way to prompt it? Would I get better results during certain hours of the day?

For reference, I used Opus 4.6 Extended.

  • tohuwabohu@programming.dev
    link
    fedilink
    arrow-up
    17
    ·
    17 hours ago

    I use my own brain to sketch out what I want to work and how. Before writing any code, I use the LLM to point out gaps and how to close them. Pros and cons of certain decisions. Things you would discuss with colleagues. Then, I come up with a plan for the order I want the code to be written in and how to fragment that into smaller, easy to handle modules. I supervise and review each chunk produced, adapt code mostly manually if required, write the edge case tests - most importantly, run it - and move to the next. This is how I use it successfully and get results much faster than the traditional way.

    At my job though I can witness how other people use it. I was asked to review a fully vibecoded fullstack app that contains every mistake possible. Unsanitizised input. Hardcoded tokens. Hardcoded credentials. 2500+ LoC classes and functions. Business logic orchestrators masquerading as service. Full table scans on each request. Cross-tenant data leaks. Loading whole tables into the memory. No test coverage for the most critical paths. Tests requiring external services to run. The list goes on. Now they want me to make it production ready in 8 weeks “because you have AI”.

    My point: This was an endorphine fueled vibecoding session by someone who has no experience as developer, asked the LLM to “just make it work”, lacking the ability to supervise the work that comes with experience. It was enough to make it rum locally and pitch a “system engineered w/o any developer” to management.

    Those systems need guidance just as a Junior would and I am strongly and loudly advocating to restrict access to this incredibly useful tool to people who know what they do. Nobody would allow a manager to use a laser cutter in a carpentry workshop without proper training, worst case is they will burn down the whole shack.

    I appreciate you having a open mind about it at least. I needed some time to adjust as well. I don’t even use Opus, most of the time my workflow consistently produces usable code with Sonnet. Maybe you can try what I explained initially? Just don’t try any language you’re not familiar with, that will not end well.