…and I still don’t get it. I paid for a month of Pro to try it out, and it is consistently and confidently producing subtly broken junk. I had tried doing this before in the past, but gave up because it didn’t work well. I thought that maybe this time it would be far along enough to be useful.

The task was relatively simple, and it involved doing some 3d math. The solutions it generated were almost write every time, but critically broken in subtle ways, and any attempt to fix the problems would either introduce new bugs, or regress with old bugs.

I spent nearly the whole day yesterday going back and forth with it, and felt like I was in a mental fog. It wasn’t until I had a full night’s sleep and reviewed the chat log this morning until I realized how much I was going in circles. I tried prompting a bit more today, but stopped when it kept doing the same crap.

The worst part of this is that, through out all of this, Claude was confidently responding. When I said there was a bug, it would “fix” the bug, and provide a confident explanation of what was wrong… Except it was clearly bullshit because it didn’t work.

I still want to keep an open mind. Is anyone having success with these tools? Is there a special way to prompt it? Would I get better results during certain hours of the day?

For reference, I used Opus 4.6 Extended.

  • arthur@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    34 minutes ago

    I’m using (Gemini 3.1 pro in) Gemini cli to build a complex (personal) project to explore how to use these tools. My impression is that the code produced by LLMs is disposable/throwaway. We need to babysit the model and be very hands on to get good results.

  • x00z@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    20 minutes ago

    The trick about vibe coding is that you confidently release the messed up code as something amazing by generating a professional looking readme to accompany it.

  • thedeadwalking4242@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    24 minutes ago

    I use it for tedious transformations or needle ones haystack problems.

    They are better at searching for themes or concepts then they are at actually doing any “thinking tasks”. My rule is that if requires a lot of critical thinking then the LLM can do it.

    It’s definitely not all they say it is. I think LLMs will fundamentally always have these problems.

    I’ve actually had a much better time using it for in line completion as if recent. It’s much better when the scope of the problem it needs to “solve” ( the code it needs to find and compose to complete your line ) is like the Goldilocks zone. And if the answer it gives is bad I just keep typing.

    I really hate the way LLM vibe coded slop is written and architecture. To me is clear these things have extremely limited conception. I’ve compared it to essentially ripping out the human language center, giving it a keyboard and asking it to program for you. It’s just no really what it’s good at.

  • Katherine 🪴@piefed.social
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    2 hours ago

    Don’t just use it as a drop in replacement for a programmer; use it to automate menial tasks while employing trust but verify with every output it produces.

    A well written CLAUDE.md and prompt to restrict it from auto committing, auto pushing, and auto editing without explicit verification before doing anything will keep everything in your control while also aiding menial maintenance tasks like repetitive sections or user tests.

    • Feyd@programming.dev
      link
      fedilink
      arrow-up
      2
      ·
      1 hour ago

      verify with every output it produces.

      I agree that you can get quality output using these tools, but if you actually take the time to validate and fix everything they’ve output then you spend more time than if you’d just written it, rob yourself of experience, and melt glaciers for no reason in the process.

      prompt to restrict it from auto committing, auto pushing, and auto editing without explicit verification

      Anything in the prompt is a suggestion, not a restriction. You are correct you should restrict those actions, but it must be done outside of the chatbot layer. This is part of the problem with this stuff. People using it don’t understand what it is or how it works at all and are being ridiculously irresponsible.

      repetitive sections

      Repetitive sections that are logic can be factored down and should be for maintainability. Those that can’t be can be written with tons of methods. A list of words can be expanded into whatever repetitive boilerplate with sed, awk, a python script etc and you’ll know nothing was hallucinated because it was deterministic in the first place.

      user tests.

      Tests are just as important as the rest of the code and should be given the same amount of attention instead of being treated as fine as long as you check the box.

      • Katherine 🪴@piefed.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        59 minutes ago

        I agree it’s not perfect; I still only use it very sparingly, I was just just saying as an alternative to trusting everything it does out of the box.

  • Michal@programming.dev
    link
    fedilink
    arrow-up
    4
    ·
    3 hours ago

    You can’t really just use Claude code raw. You have to give it detailed instructions, use Claude skills,observe results, update prompts. It can be just as consuming, but rather that doing the productive work, you’re just reviewing and correcting AI. People who have success using AI have invested time in their setup and are continuously adjusting it.

    • KeenFlame@feddit.nu
      link
      fedilink
      arrow-up
      2
      ·
      46 minutes ago

      But all in all extremely much faster. That’s the reason it is not useless. Everyone whines that it takes so much time when no it is not close to manual. It’s not a magic pill and you need the know how still, but no, it does not take “just as time consuming”. You are more productive. But yes, it is also more boring.

  • ReallyCoolDude@lemmy.ml
    link
    fedilink
    arrow-up
    3
    ·
    4 hours ago

    I read a lot of these posts that sadly leave out the basic parts: what were your prompts? What does it means in this context ‘vibe coding’? Did you create an initial setup, and slowly build up? Did you left wverything to the agent understanding, and just pushed approve or reject? There are multiple levels of quality that depends on the input. Did you get into context rotting? 3d math means vector math, matrices, or what? Given claude has a serious problem from march at least, the way u use it is paramount. In our team we all use claude with copilot ( sadly, that is a business directive ), and while excpetional at finding small relationships in components and microservices, had to build a long list of skills just to be barely usable in a ‘star trek’ way. The bottom line is that it is that you must be extremely precise when asking. Prompt modeling count a lot. Context build as well. For now, unit tests and data/mocks refactors are working extremely well for me, when i define the tests cases. My agents got to a point where i can safely have small peoperty additions with refactors on multiple repositories at once ( ie: i change the contract on microservice a, microservices b,c,and d are automatically updated ). This last part had to.be built thoug, with memory, engrams, and some fune tuing. It is not always a shit: if not nobody would use it. It is not this revolutionary technology that will make humans obsolete as well ( as they are selling it ).

  • sobchak@programming.dev
    link
    fedilink
    arrow-up
    15
    ·
    6 hours ago

    Key is having it write tests and have it iterate by itself, and also managing context in various ways. It only works on small projects in my experience. And it generates shit code that’s not worth manually working on, so it kind of locks your project in to being always dependent on AI. Being always dependant on AI, and AI hitting a brick wall eventually means you’ll reach a point where you can’t really improve the project anymore. I.e. AI tools are nearly useless.

  • kunaltyagi@programming.dev
    link
    fedilink
    arrow-up
    10
    ·
    8 hours ago

    Don’t jump right in to coding.

    Take a feature you want, and use the plan feature to break it down. Give the plan a read. Make sure you have tests covering the files it says it’ll need to touch. If not, add tests (can use LLM for that as well).

    Then let the LLM work. Success rates for me are around 80% or higher for medium tasks (30 mins–1 hour for me without LLM, 15–30 mins with one, including code review)

    If a task is 5 mins or so, it’s usually a hit or miss (since planning would take longer). For tasks longer than 1 hour or so, it depends. Sometimes the code is full of simple idioms that the LLM can easily crush it. Other times I need to actively break it down into digestible chunks

  • onlinepersona@programming.dev
    link
    fedilink
    arrow-up
    7
    ·
    8 hours ago

    It’s not called “correct” coding for a reason.

    That’s why people are wrong so often: they feel like something is right, but don’t check. That’s how you get anti -vaxxers, manospere people, MAGA, QAnon, Brexit, etc.

  • athatet@lemmy.zip
    link
    fedilink
    arrow-up
    23
    ·
    13 hours ago

    The reason you kept going around in circles and reintroducing bugs you already got rid of is because LLMs don’t remember things. Every time you tell it something it tells it the entire conversation again so it has all the parts. Eventually it runs out of room and starts cutting off the beginning of the convo and now the llm can’t ‘remember’ what it was you were even talking about.

    • KeenFlame@feddit.nu
      link
      fedilink
      arrow-up
      1
      ·
      44 minutes ago

      Kind of, but it really depends on the workflow. Simple 3d math does not extend to a codebase that is impacted by context window

    • Railcar8095@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      7 hours ago

      For that you can ask to update a documentation/status file on every update. You can manually add the goal and/or tasks for the future.

      With that, I improved my success a lot even when starting new sessions (add in the instructions file to use this file for reference, so you don’t have to remind every time)

  • Feyd@programming.dev
    link
    fedilink
    arrow-up
    128
    ·
    19 hours ago

    producing subtly broken junk

    The difference between you and people that say it’s amazing is that you are capable of discerning this reality.

    • JustEnoughDucks@feddit.nl
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      4 hours ago

      I wonder if it was even able to compile. I am a shitty hobby coder who just does it to make my embedded hardware projects function.

      I have yet to get compilable code out of any of the AI bots I have tried. Gemini, mistral, and chatGPT. I am not making an account lol.

      I have gotten some compilable python and VBA code for data analysis stuff at work, so I wonder if it is because embedded stuff uses specific SDKs that it can’t handle.

      Either way I have given up on it for anything besides bouncing ideas off of or debugging where electromagnetics issues could lie (though it has been completely wrong about that also even though it is using the wrong concepts, it just reminds me of concepts that I might have overlooked)

    • OwOarchist@pawb.social
      link
      fedilink
      English
      arrow-up
      40
      ·
      19 hours ago

      What I don’t get, though, is how the vibe code bros can’t discern this reality.

      How can they sit there and not see that their vibe-coded app just doesn’t do what they wanted it to do? Eventually, you’ve got to try actually running the app, right? And how do you keep drinking the AI kool-aid when you find out that the app doesn’t work?

      • Lumelore (She/her)@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        25
        ·
        edit-2
        16 hours ago

        Vibe code bros aren’t real programmers. They’re business people, not computer people. Even if they have a CS degree, they only got that because they think it’ll get them more money. They lack passion and they don’t care about understanding anything. They probably don’t even care about what they’re generating beyond its potential to be used in a grift.

        I graduated college not that long ago and my CS classes had quite a few former business majors. They switched because they think it’ll be more lucrative for them but since they only care about money they didn’t bother to actually learn the material especially since they could just vibe code through everything.

        • b_n@sh.itjust.works
          link
          fedilink
          arrow-up
          4
          ·
          7 hours ago

          So much this.

          After working in tech companies for the last 10 years I’ve noticed the difference between people that “generate code” and those that engineer code.

          My worry about the industry is that vibe coding gives the code generators the ability to generate even more code. The engineers (even those that use vibe tools) are not engineering as much code by volume compared to “the generators”.

          My hope is that this is one of those “short term gain, long term pain” things that might self correct in a couple of years 🤞.

      • Feyd@programming.dev
        link
        fedilink
        arrow-up
        31
        ·
        18 hours ago

        They’re the same people that copied code from stack overflow that you had to tell them how to actually fix every PR. The difference is the C suite types are backing them this time

      • tleb@lemmy.ca
        link
        fedilink
        arrow-up
        8
        ·
        15 hours ago

        Eventually, you’ve got to try actually running the app, right?

        At least at my company, no, they just start selling it.

      • Oisteink@lemmy.world
        link
        fedilink
        arrow-up
        8
        ·
        17 hours ago

        I do apps that work, i do patches that are production quality. Half the cs world does… I do full stack ai debugging of esp32 projects.

        It’s a powerful tool, you just need to learn it’s strong and weak points, just like any other tool you use.

        • Kissaki@programming.dev
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 hours ago

          Half the cs world does…

          What’s the basis for this claim? I’m doubtful, but don’t have wide data for this.

          • Oisteink@lemmy.world
            link
            fedilink
            arrow-up
            3
            ·
            2 hours ago

            Rough estimate from my personal connections only. Some work places where ai is not possible, but all that have made an effort report good code. You need to work with what it is - a word generator that sometimes gives correct results. Make it research and not trust training. Never let it do things on its own, require a plan and reason. Make it evaluate its own work/plan.

            Most issues i have stem from models beeing too eager. Restrain them and remove the “i can do this next…”behaviour.

            Context is king - so proper mcp and documentation that is agent facing. I use serena as i can get lsp for yaml, markup and keep these docs like that

  • tohuwabohu@programming.dev
    link
    fedilink
    arrow-up
    17
    ·
    15 hours ago

    I use my own brain to sketch out what I want to work and how. Before writing any code, I use the LLM to point out gaps and how to close them. Pros and cons of certain decisions. Things you would discuss with colleagues. Then, I come up with a plan for the order I want the code to be written in and how to fragment that into smaller, easy to handle modules. I supervise and review each chunk produced, adapt code mostly manually if required, write the edge case tests - most importantly, run it - and move to the next. This is how I use it successfully and get results much faster than the traditional way.

    At my job though I can witness how other people use it. I was asked to review a fully vibecoded fullstack app that contains every mistake possible. Unsanitizised input. Hardcoded tokens. Hardcoded credentials. 2500+ LoC classes and functions. Business logic orchestrators masquerading as service. Full table scans on each request. Cross-tenant data leaks. Loading whole tables into the memory. No test coverage for the most critical paths. Tests requiring external services to run. The list goes on. Now they want me to make it production ready in 8 weeks “because you have AI”.

    My point: This was an endorphine fueled vibecoding session by someone who has no experience as developer, asked the LLM to “just make it work”, lacking the ability to supervise the work that comes with experience. It was enough to make it rum locally and pitch a “system engineered w/o any developer” to management.

    Those systems need guidance just as a Junior would and I am strongly and loudly advocating to restrict access to this incredibly useful tool to people who know what they do. Nobody would allow a manager to use a laser cutter in a carpentry workshop without proper training, worst case is they will burn down the whole shack.

    I appreciate you having a open mind about it at least. I needed some time to adjust as well. I don’t even use Opus, most of the time my workflow consistently produces usable code with Sonnet. Maybe you can try what I explained initially? Just don’t try any language you’re not familiar with, that will not end well.

  • cecilkorik@lemmy.ca
    link
    fedilink
    English
    arrow-up
    79
    ·
    edit-2
    20 hours ago

    No, I think you do get it. That’s exactly right. Everything you described is absolutely valid.

    Maybe the only piece you’re missing is that “almost right, but critically broken in subtle ways” turns out to actually be more than good enough for many people and many purposes. You’re describing the “success” state.

    /s but also not /s because this is the unfortunate reality we live in now. We’re all going to eat slop and sooner or later we’re going to be forced to like it.

    • vga@sopuli.xyz
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      26 minutes ago

      Maybe the only piece you’re missing is that “almost right, but critically broken in subtle ways”

      Sure, but you have to note that it reaches that point in minutes. Sometimes on a task that would take humans a week. The power is not that it creates correct stuff, it’s that it creates almost correct stuff 100 times faster than human. Plus the typical machine benefits: it never gets tired, demotivated, etc.

      So then the challenge becomes being able to be that human, who can review stuff extremely well and rapidly, being natural in probing the stuff LLMs tend to be wrong about. Sort of like the same challenge that every tech lead had before LLMs too, but just subtly different, because LLMs don’t exactly think like we do.

    • pinball_wizard@lemmy.zip
      link
      fedilink
      arrow-up
      3
      ·
      10 hours ago

      almost right, but critically broken in subtle ways” turns out to actually be more than good enough for many people and many purposes. You’re describing the “success” state.

      Exactly. The consequences are at worst a problem for “future me”, and at best “somebody else’s problem”.

      AI didn’t create this reality, but it’s certainly moved it into the spotlight and to “center stage.”

    • GiorgioPerlasca@lemmy.ml
      link
      fedilink
      arrow-up
      7
      ·
      19 hours ago

      Or maybe we will be forced to switch off LLMs and start solving the bugs introduced by their usage using our minds.

      • cecilkorik@lemmy.ca
        link
        fedilink
        English
        arrow-up
        13
        ·
        18 hours ago

        As a professional software developer, I truly hope that is the case (and I plan to charge at least 10x my current rate after the AI bubble pops when I’m looking for my next job as I expect there to be a massive shortage of people skilled enough to actually deal with the nightmare spaghetti AI code bases)

        Fun times ahead.

        • tohuwabohu@programming.dev
          link
          fedilink
          arrow-up
          8
          ·
          15 hours ago

          It will be interesting (read as: bad) times to get to that point and I agree. The Junior market is basically not existent ever since coding agents appeared, stripping the industry of its future Seniors. We will be chained to our desks.