I made my LLM stop bullshitting. Nothing leaves your machine.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

I made my LLM stop bullshitting. Nothing leaves your machine.

Libb@piefed.social · 2 months ago

I’m no dev so I don’t understand all the technicalities but if I got it right you made it so the AI is itself showing how confident it is about its own answers? That is neat.

Not sure to understand the downvotes? Ins’t it a good idea to make it harder for AI to be telling bullshit without blushing?

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

seadoo@lemmy.world · 2 months ago

I think interesting? It’s kind of hard to tell.

You are going to have to significantly tone down the editorialization and platitudes to get this to a place where a journal might consider it.

Make the point of how it’s novel or useful by explaining what it does, not by repeating that it’s novel and useful.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

glarf@lemmy.world · 2 months ago

The description has such an unsettling, overconfident, llm-style tone for a project described as something to challenge LLM hallucinations.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

glarf@lemmy.world · 2 months ago

Good for you, welcome to the internet where people’s opinions abound. I didn’t accuse you of writing it with an LLM I said it was an LLM style, if you don’t like my opinion, that’s fine with me. I simply found the writing style unsettling. Cheers!

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

Dearth@lemmy.world · edit-2 2 months ago

LLMs were created by reading millions of *social media posts written by neurodivergent people sharing their passions online.

*edit: spelling

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

machiavellian@lemmy.ml · 2 months ago

Although I’m generally opposed to AI in general and LLMs in particular, this project seems really cool. Might actually change my stance on LLM usage. Kudos and hope this gets more attention and development!

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

iByteABit@lemmy.ml · edit-2 2 months ago

I have trouble understanding what makes it list “Context” as its source as opposed to “Model” and how that makes it any more deterministic, can you give a more detailed example?

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

someacnt@sh.itjust.works · 2 months ago

I was like, why aren’t you publishing it to a conference/journal if it is good? Then realized that you are doing exactly that. Kudos for the work, looking forward to the progress!

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

skarn@discuss.tchncs.de · 1 month ago

As a physicist, my favorite referee comment ever was [That my claim was wrong] “should be obvious to anyone who has ever sat through an elementary electromagnetism course.” He was wrong BTW, and the paper was finally published in a different journal.

I am from a decidedly different field, so I don’t know if I can vouch for you in any meaningful way.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

twoBrokenThumbs@lemmy.world · 2 months ago

Thanks for sharing. I’ve not yet delved into reading it in depth but appreciate your goals and the fact that you documented it all.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

okwhateverdude@lemmy.world · 2 months ago

So I was curious about how you accomplished this and took a look with the robots to figure it out.

TL;DR: the router is a massive decision tree using heuristics and regex to avoid LLM calls on unprefixed prompts.

I think this is an interesting, brute force approach to the problem, but one that will always struggle with edge cases. The other bit it will struggle with is transparency. Yes, it might be deterministic because it is a decision tree, but unless you really understand how that decision tree works under the hood and know where the pitfalls are, you’re going to end up talking to the LLM a lot of the time anyhow.

Something you might want to consider is doing a fine-tune of a smol model (think something like qwen3:1.7B or even smaller like one of the gemma3n sub-1B) that will do the routing for you. You can easily build the dataset synthetically or harvest your own logs. I think this might end up covering more edge cases more smoothly without resorting to a big call to a larger model

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

okwhateverdude@lemmy.world · 2 months ago

Cool man. It is really refreshing to see this level of engagement. You’ve really thought this though. You’re right about the routing model moving it up a level and also about retraining. It’s all trade-offs.

Are you intending this for others to use or is this really just for you? Because I think what you’re slowly building is a power tool with a whack-a-mole set of routing tweaks specifically for you. Nothing wrong with that, but the barrier to entry for others to use this is reading that routing and understanding the foibles that have been baked in with your preferences in mind, and even adding fixes and tweaks of their own which kinda breaks the magic a little.

This was really the point I was making about transparency.

I appreciate others also doing real work with potato GPUs because I, too, have a potato GPU (6GB). I think there is real utility in continuing to develop this.

I’ll give this a star and follow along. It doesn’t really fit my mental model of how I’d like my harness to behave, but I will totally steal some of these ideas.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

James R Kirk@startrek.website · 2 months ago

So uh… what does it do? Summarize short articles?

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

fubarx@lemmy.world · 2 months ago

Looks interesting. Will give it a whirl on my home server.

In this article, they talk about bringing up a local RAG system to let people run an LLM off a large document corpus: https://en.andros.dev/blog/aa31d744/from-zero-to-a-rag-system-successes-and-failures/

Wonder if this, connected to something like that, and wrapped in an easy end-user friendly script or UI could be a good combination for a local, domain-specific, grounded knowledge-base?

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

fubarx@lemmy.world · 2 months ago

The problem with CAG is not just that it hogs memory, but to keep it fresh you have to keep re-indexing. If the corpus is large and dynamic, it can easily fall out of date and, at runtime, blow out the context window.

GraphRAG has some promise. NVidia has a playbook for converting text into a knowledge graph: https://build.nvidia.com/spark/txt2kg

It’ll probably have the same issues with reindexing, but that will be a common problem, until someone comes up with better incremental training/indexing.

utopiah@lemmy.ml · edit-2 2 months ago

Can’t it source other LLM outputs as “verified source” and thus still say whatever sounds good, like any LLM? Providing “technical” verification, e.g. SHA, gives no insurance about the content itself being from a reputable source. I don’t think adding confidence and sourcing changes anything, the user STILL has to verify that whatever is provided is coherent and a third party is actually a good source. Thanks for making the process public though, doing better than OpenAI does.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

utopiah@lemmy.ml · 2 months ago

Isn’t it “source: model” basically roulette? We go back to the initial problem. Also anything else that is not model might also be hallucinated if at any point the string that gives back “source:” goes through the model.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

JustinTheGM@ttrpg.network · 2 months ago

Fair, but that’s the same problem human thinkers face. Faulty inputs == faulty outputs. You should always be validating your sources.

utopiah@lemmy.ml · 2 months ago

Right but if one person keeps on giving me wrong answers, knowingly or not, my distrust in them in not linear. They’ll have to “earn” it back and it’s going to be very challenging. If they do learn though, then it might come back faster. In this setup I have no guarantee of any progress. There no “one” in there trying to fix any mistake.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

danh2os@piefed.social · 2 months ago

AI will think for you if you prompt it to do so. It’s up to the user to use the tool in a way that suits your style.

ScoffingLizard@lemmy.dbzer0.com · 2 months ago

So basically, you created a prompt wrapper that removes position bias by using trust to evaluate both, and forcing an evidence path with scratch. This is a really cool development. It probably will not solve everything but it solves alot.

Is llama open source?

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

CodenameDarlen@lemmy.world · 2 months ago

TLDR.

So you basically solved humanity problems with LLMs, you should sell it to NVIDIA and be rich, no more hallucination.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

CodenameDarlen@lemmy.world · 2 months ago

You should have made it clear in the title. The title is the most important part and you’re literally saying you’ve made LLM stop “bullshitting”. Pretentious phrase to draw everyone attention. Then in the body you correct everybody assumption. Dirty move.

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

sem@piefed.blahaj.zone · 1 month ago

For someone who has never run an ai locally – can you set this up on a regular laptop? How world you do that

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

SuspciousCarrot78@lemmy.world · edit-2 22 days ago

[deleted by user]

I made my LLM stop bullshitting. Nothing leaves your machine.

I made my LLM stop bullshitting. Nothing leaves your machine.

llama-conductor