…and I still don’t get it. I paid for a month of Pro to try it out, and it is consistently and confidently producing subtly broken junk. I had tried doing this before in the past, but gave up because it didn’t work well. I thought that maybe this time it would be far along enough to be useful.

The task was relatively simple, and it involved doing some 3d math. The solutions it generated were almost write every time, but critically broken in subtle ways, and any attempt to fix the problems would either introduce new bugs, or regress with old bugs.

I spent nearly the whole day yesterday going back and forth with it, and felt like I was in a mental fog. It wasn’t until I had a full night’s sleep and reviewed the chat log this morning until I realized how much I was going in circles. I tried prompting a bit more today, but stopped when it kept doing the same crap.

The worst part of this is that, through out all of this, Claude was confidently responding. When I said there was a bug, it would “fix” the bug, and provide a confident explanation of what was wrong… Except it was clearly bullshit because it didn’t work.

I still want to keep an open mind. Is anyone having success with these tools? Is there a special way to prompt it? Would I get better results during certain hours of the day?

For reference, I used Opus 4.6 Extended.

  • Katherine 🪴@piefed.social
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    3 hours ago

    Don’t just use it as a drop in replacement for a programmer; use it to automate menial tasks while employing trust but verify with every output it produces.

    A well written CLAUDE.md and prompt to restrict it from auto committing, auto pushing, and auto editing without explicit verification before doing anything will keep everything in your control while also aiding menial maintenance tasks like repetitive sections or user tests.

    • Feyd@programming.dev
      link
      fedilink
      arrow-up
      3
      ·
      3 hours ago

      verify with every output it produces.

      I agree that you can get quality output using these tools, but if you actually take the time to validate and fix everything they’ve output then you spend more time than if you’d just written it, rob yourself of experience, and melt glaciers for no reason in the process.

      prompt to restrict it from auto committing, auto pushing, and auto editing without explicit verification

      Anything in the prompt is a suggestion, not a restriction. You are correct you should restrict those actions, but it must be done outside of the chatbot layer. This is part of the problem with this stuff. People using it don’t understand what it is or how it works at all and are being ridiculously irresponsible.

      repetitive sections

      Repetitive sections that are logic can be factored down and should be for maintainability. Those that can’t be can be written with tons of methods. A list of words can be expanded into whatever repetitive boilerplate with sed, awk, a python script etc and you’ll know nothing was hallucinated because it was deterministic in the first place.

      user tests.

      Tests are just as important as the rest of the code and should be given the same amount of attention instead of being treated as fine as long as you check the box.

      • Katherine 🪴@piefed.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 hours ago

        I agree it’s not perfect; I still only use it very sparingly, I was just just saying as an alternative to trusting everything it does out of the box.