[deleted by user]

  • Libb@piefed.social
    link
    fedilink
    English
    arrow-up
    18
    ·
    1 month ago

    I’m no dev so I don’t understand all the technicalities but if I got it right you made it so the AI is itself showing how confident it is about its own answers? That is neat.

    Not sure to understand the downvotes? Ins’t it a good idea to make it harder for AI to be telling bullshit without blushing?

  • seadoo@lemmy.world
    link
    fedilink
    arrow-up
    12
    ·
    1 month ago

    I think interesting? It’s kind of hard to tell.

    You are going to have to significantly tone down the editorialization and platitudes to get this to a place where a journal might consider it.

    Make the point of how it’s novel or useful by explaining what it does, not by repeating that it’s novel and useful.

  • machiavellian@lemmy.ml
    link
    fedilink
    arrow-up
    8
    ·
    1 month ago

    Although I’m generally opposed to AI in general and LLMs in particular, this project seems really cool. Might actually change my stance on LLM usage. Kudos and hope this gets more attention and development!

  • iByteABit@lemmy.ml
    link
    fedilink
    arrow-up
    5
    ·
    edit-2
    1 month ago

    I have trouble understanding what makes it list “Context” as its source as opposed to “Model” and how that makes it any more deterministic, can you give a more detailed example?

  • someacnt@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 month ago

    I was like, why aren’t you publishing it to a conference/journal if it is good? Then realized that you are doing exactly that. Kudos for the work, looking forward to the progress!

      • skarn@discuss.tchncs.de
        link
        fedilink
        arrow-up
        3
        ·
        22 days ago

        As a physicist, my favorite referee comment ever was [That my claim was wrong] “should be obvious to anyone who has ever sat through an elementary electromagnetism course.” He was wrong BTW, and the paper was finally published in a different journal.

        I am from a decidedly different field, so I don’t know if I can vouch for you in any meaningful way.

  • twoBrokenThumbs@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 month ago

    Thanks for sharing. I’ve not yet delved into reading it in depth but appreciate your goals and the fact that you documented it all.

  • okwhateverdude@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 month ago

    So I was curious about how you accomplished this and took a look with the robots to figure it out.

    TL;DR: the router is a massive decision tree using heuristics and regex to avoid LLM calls on unprefixed prompts.

    I think this is an interesting, brute force approach to the problem, but one that will always struggle with edge cases. The other bit it will struggle with is transparency. Yes, it might be deterministic because it is a decision tree, but unless you really understand how that decision tree works under the hood and know where the pitfalls are, you’re going to end up talking to the LLM a lot of the time anyhow.

    Something you might want to consider is doing a fine-tune of a smol model (think something like qwen3:1.7B or even smaller like one of the gemma3n sub-1B) that will do the routing for you. You can easily build the dataset synthetically or harvest your own logs. I think this might end up covering more edge cases more smoothly without resorting to a big call to a larger model

      • okwhateverdude@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 month ago

        Cool man. It is really refreshing to see this level of engagement. You’ve really thought this though. You’re right about the routing model moving it up a level and also about retraining. It’s all trade-offs.

        Are you intending this for others to use or is this really just for you? Because I think what you’re slowly building is a power tool with a whack-a-mole set of routing tweaks specifically for you. Nothing wrong with that, but the barrier to entry for others to use this is reading that routing and understanding the foibles that have been baked in with your preferences in mind, and even adding fixes and tweaks of their own which kinda breaks the magic a little.

        This was really the point I was making about transparency.

        I appreciate others also doing real work with potato GPUs because I, too, have a potato GPU (6GB). I think there is real utility in continuing to develop this.

        I’ll give this a star and follow along. It doesn’t really fit my mental model of how I’d like my harness to behave, but I will totally steal some of these ideas.

  • fubarx@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    1 month ago

    Looks interesting. Will give it a whirl on my home server.

    In this article, they talk about bringing up a local RAG system to let people run an LLM off a large document corpus: https://en.andros.dev/blog/aa31d744/from-zero-to-a-rag-system-successes-and-failures/

    Wonder if this, connected to something like that, and wrapped in an easy end-user friendly script or UI could be a good combination for a local, domain-specific, grounded knowledge-base?

      • fubarx@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        1 month ago

        The problem with CAG is not just that it hogs memory, but to keep it fresh you have to keep re-indexing. If the corpus is large and dynamic, it can easily fall out of date and, at runtime, blow out the context window.

        GraphRAG has some promise. NVidia has a playbook for converting text into a knowledge graph: https://build.nvidia.com/spark/txt2kg

        It’ll probably have the same issues with reindexing, but that will be a common problem, until someone comes up with better incremental training/indexing.

  • utopiah@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 month ago

    Can’t it source other LLM outputs as “verified source” and thus still say whatever sounds good, like any LLM? Providing “technical” verification, e.g. SHA, gives no insurance about the content itself being from a reputable source. I don’t think adding confidence and sourcing changes anything, the user STILL has to verify that whatever is provided is coherent and a third party is actually a good source. Thanks for making the process public though, doing better than OpenAI does.

      • utopiah@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        1 month ago

        Isn’t it “source: model” basically roulette? We go back to the initial problem. Also anything else that is not model might also be hallucinated if at any point the string that gives back “source:” goes through the model.

    • JustinTheGM@ttrpg.network
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 month ago

      Fair, but that’s the same problem human thinkers face. Faulty inputs == faulty outputs. You should always be validating your sources.

      • utopiah@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        1 month ago

        Right but if one person keeps on giving me wrong answers, knowingly or not, my distrust in them in not linear. They’ll have to “earn” it back and it’s going to be very challenging. If they do learn though, then it might come back faster. In this setup I have no guarantee of any progress. There no “one” in there trying to fix any mistake.

  • danh2os@piefed.social
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 month ago

    AI will think for you if you prompt it to do so. It’s up to the user to use the tool in a way that suits your style.

  • ScoffingLizard@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    1
    ·
    1 month ago

    So basically, you created a prompt wrapper that removes position bias by using trust to evaluate both, and forcing an evidence path with scratch. This is a really cool development. It probably will not solve everything but it solves alot.

    Is llama open source?

  • CodenameDarlen@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    1 month ago

    TLDR.

    So you basically solved humanity problems with LLMs, you should sell it to NVIDIA and be rich, no more hallucination.

  • sem@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    1
    ·
    21 days ago

    For someone who has never run an ai locally – can you set this up on a regular laptop? How world you do that