how things become science

not_IO@lemmy.blahaj.zone · 4 days ago

how things become science

bookmeat@fedinsfw.app · 3 days ago

Without grounding, correctness is not defined. Hallucination is not a bug that scaling can fix. It is the structural consequence of operating without concepts. – Gregory Coppola

squaresinger@lemmy.world · 3 days ago

https://xkcd.com/978/

https://en.wikipedia.org/wiki/John_Bohannon#Intentionally_misleading_chocolate_study

We did the same before AI. AI is once again just putting an old problem on steroids.

MinnesotaGoddam@lemmy.world · 3 days ago

they do the same to protect doctors from malpractice lawsuits. there is a (laughably peer reviewed) study that claims tylenol and morphine are equally effective at pain management.

alsimoneau@lemmy.ca · 2 days ago

That’s all it can do.

Sylvartas@lemmy.dbzer0.com · 3 days ago

under the pseudonym Johannes Bohannon, John Bohannon …

I can see why he went into science and not, say, creative writing.

Teppa@lemmy.world · 3 days ago

AI’s dont know that birds arent real, or that sometimes the pressure from being under water for an extended period of time can cause fish to explode.

DeathsEmbrace@lemmy.world · 4 days ago

Before anyone shits on these scientists it said over and over again it was made up and that officially the USS Enterprise labs were used to make this discovery.

Kacarott@aussie.zone · 3 days ago

The Federation would never publish fake data, so it must be true!

partial_accumen@lemmy.world · 4 days ago

I give you… “The Grant Money Printing machine!”

Need a grant? Create a disease and submit a paper. Then write a grant asking for money to solve your invented disease.

Jankatarch@lemmy.world · edit-2 3 days ago

If you want research grants there is already a glitch for that. You just jam “AI” in your research and suddenly government cares about progress now.

adr1an@programming.dev · 3 days ago

Wait until you hear about paper mills… They were here long before LLMs. This can only get worse… Unless, “we” do something. Or journals themselves do it. Not sure what or how, but better audited ways. Even academia itself could start by valuing more the work of reviewers.

humble_boatsman@sh.itjust.works · 3 days ago

So like a University?

Arghblarg@lemmy.ca · 3 days ago

Good. This shows plainly how LLMs don’t think, don’t truly understand anything, and have no critical ability to do introspection or fact-checking. It seems the only way to teach the world of these things is to make it impossible to ignore via absurd demonstrations like this. If the “AI” well must be poisoned in order to wake people up, I’m all for it.

Teppa@lemmy.world · edit-2 3 days ago

Isnt 80% of its data from Reddit anyways, seems quite poisoned already given the amount of confidently incorrect people.

With how Reddit is monetizing itself now I’d assume Lemmy actually becomes more widely used than Reddit however, since it should be totally free.

RagingRobot@lemmy.world · 3 days ago

I wonder if we got a group together to go on reddit and stack overflow and give really wrong programming answers and vote them to the top, if Claude would start sucking? They could always just revert to a previous model and it would probably be too hard to get enough people and content to have an effect with such large training sets. Maybe if you use ai? Lol

Napster153@lemmy.world · 3 days ago

Didnn’t something similar happen to Grok but ended up with it generating a ton of CSAM material that circulated twitter?

kadotux@sopuli.xyz · 3 days ago

Sorry for being that guy today for you, but you can just say CSAM. It stands for Child Sexual Abuse Material". smh my head :P

portuga@lemmy.world · 3 days ago

Your last sentence saves you from being pedantic. Fun stuff, RIP in peace ✌️

Uriel_Copy@lemmy.world · 3 days ago

Classic RAS syndrome! (Redundant Acronym Syndrome)

Napster153@lemmy.world · 3 days ago

Pardon, but what… I did say CSAM, may I ask what exactly you mean?

dai@lemmy.world · 3 days ago

Did you drop your ATM machine?

Dicska@lemmy.world · 3 days ago

Does it take small size compact CD discs?

ltxrtquq@lemmy.ml · 3 days ago

Only if you remember your PIN number.

squaresinger@lemmy.world · 3 days ago

I think I caught an RSV virus from you.

ITGuyLevi@programming.dev · 3 days ago

Some people, when they see an acronym, will replace it with the words it stands for in their head. A subset of that group of people get annoyed when the sentence gets all muddled up by repeated words; in this particular case, you said ‘CSAM material’, which their brain read as ‘child sexual abuse material material’.

It isn’t a big deal, but as one of those people, I totally get the urge to point it out (I’ve gotten pretty good at looking past it but it’s still a bit of a compulsion).

NakedGardenGnome@lemmy.dbzer0.com · 3 days ago

They are referring to your use of “CSAM material” in your sentence.

imjustmsk@lemmy.world · 3 days ago

chain tea, coffee coffee, cream cream.

Test_Tickles@lemmy.world · 3 days ago

Woo, woo, chugga, chugga, choo, choo

Bieren@lemmy.today · 3 days ago

I get what you are saying. But then the issue is this turns into fucking over actual humans looking for help.

Blackout@fedia.io · 3 days ago

Find a way to make AI hurt billionaires and I will support it.

brucethemoose@lemmy.world · edit-2 3 days ago

That’s pretty much what local ML is.

If open weights LLMs take off, and business users realize they can just finetune tiny specialized models for stuff, OpenAI is toast. All of Big Tech’s bets are. It’s why they keep fanning the “AGI” lie, and why they’re pushing for regulation so hard, why they’re shoving LLMs where they just don’t fit and harping on safety.

The_Decryptor@aussie.zone · 3 days ago

Ok, but who is making those “open weight” models though? Individuals don’t really have the resources to run these huge scraping operations, so they’re often still corporate releases with fake open source branding.

Grimy@lemmy.world · edit-2 3 days ago

They come from corporate but you can at least run them without any kind of analytics or censorship, as well as fine tune them on consumer hardware.

Consumers aren’t in the best position right now though, especially with the price hikes.

percent@infosec.pub · 3 days ago

There are huge public datasets that are often used for pretraining. Common Crawl and C4 are probably the most prominent, but there are others.

There are also big public datasets available for fine-running and instruction tuning.

The open weight models are getting pretty powerful, thanks to some Chinese labs.

brucethemoose@lemmy.world · 3 days ago

Corporate, for now.

Thing is, once they’re out there, they’re free utilities, and they can’t be taken back.

Also, they don’t really need to aggressively scrape the internet. There are many good public datasets now, and the Chinese are already making excellent use of synthetic dataset generation on (relative) shoestring budgets. Also, several nations and other large organizations are already funding open model efforts, but they just haven’t had the opportunity to catch up yet.

MalReynolds@slrpnk.net · 3 days ago

Pretty much is, they’re spending hundreds of billions on a dream (not having to pay workers) that doesn’t work, until they repurpose those datacentres to remove personal computing.

Fortunately datacentres are by design concentrated in space and therefore rather vulnerable.

rockerface🇺🇦@lemmy.cafe · 3 days ago

I wonder if there’s a prompt that you could use to make it explode the data centers

BeMoreCareful@lemmy.world · 3 days ago

Wait, so breaks containment means spreads misinformation? What timeline is this?

Final Remix@lemmy.world · edit-2 3 days ago

It’s a screenshot of a post on bsky. Don’t read too much into the specifics of the language…

Whats_your_reasoning@lemmy.world · 3 days ago

“When the text looks professional and written as a doctor writes, there’s an increase in the hallucination rates,” says Omar.

Huh, now there’s something we have in common. Trying to make sense of something a doctor wrote makes me feel like I’m hallucinating, too. Is there a class in medical school on “Illegible Handwriting,” or is it just a coincidence?

In all seriousness though, I wish I could be surprised by AI failing at this. We have entered the Misinformation Age. There’s no closing Pandora’s Box, though this time I can’t find the “hope” that’s supposed to be in the bottom of it. Society would have to turn real skeptical real fast, but I’ve met enough people to know that such a tranformation is going to take time - and by “time” I mean “decades or longer.” With AI already here, we’d have to wise up immediately… but I fear that humanity isn’t mature enough for that yet.

Jako302@feddit.org · 3 days ago

We’ve crossed the point where natural skepticism could’ve saved us months ago. Feedback loops of made up sources where a problem way before ai was a thing, but now you can be five sources deep, reading trough papers published by multiple different scientific magazines or universities, and still won’t have found the actual data all the papers depend on cause there wasn’t any in the first place.

And once a single one of these papers gets published, there will be about one million SEO articles on shitty clickbait websites that, in this case, would try to sell you a home remedy for your supposed illness. So searching for any useful information is pretty much off the table.

magnue@lemmy.world · 3 days ago

Wouldn’t humans do the same thing if someone literally writes lies on the internet?

Kacarott@aussie.zone · edit-2 3 days ago

If it were convincing lies made to deceive, then sure. But in this case the papers were deliberately made to be immediately obviously fake, to anyone actually reading them.

So I guess the question would be “would humans do the same thing if someone literally writes obvious jokes on the internet?”

HylicManoeuvre@mander.xyz · 3 days ago

More shockingly, three Indian researchers published a research paper that cited the preprint on the fake disease in Cureus, a peer-reviewed journal published by Springer. It was subsequently retracted.

lol

squaresinger@lemmy.world · edit-2 3 days ago

https://en.wikipedia.org/wiki/John_Bohannon#Intentionally_misleading_chocolate_study

Yes, people would exactly do the same, because nobody reads anything but the headline of a paper. Even journalists don’t.

AI didn’t invent the problem, but it put the problem on steroids.

ExperiencedWinter@lemmy.world · edit-2 3 days ago

Even journalists don’t

Not sure what point your making here, I wouldn’t expect most journalists to be great at reading the details of papers like this…

Test_Tickles@lemmy.world · 3 days ago

Research and fact checking is what separates journalists from hacks.
“Journalist” implies factual information, not science fiction. If someone writes a “news” story about the magic land of Xanth because they can’t tell the difference between a Piers Anthony novel and a scientific study it’s not Piers Anthony’s fault for being too “tricky”.

squaresinger@lemmy.world · 3 days ago

Vetting sources is the one thing we need journalists for. If they don’t vet their sources, their work is without merit.

Reading at least the methodology section of a paper and googling if the researchers and the institute exists, is the bare minimum of what a decent journalist should do.

If they can’t do that, then there’s no advantage of a journalist over some random person posting on Facebook. Even Youtubers usually vet their sources better.

Honytawk@discuss.tchncs.de · 3 days ago

Looks at Flat-Earthers

Yes they would

Napster153@lemmy.world · 3 days ago

That’s how we ended up with modern day anti-vaxxers but at least with humans you can strangle the dude responsible. LLMs function like modern idols that the makers use to get away with.

Foofighter@discuss.tchncs.de · 3 days ago

Absolutely! Once false information is out there it can’t be retracted even if the article itself is retracted. Bumblebees can’t fly and vaccines cause autism are good examples of that. The only difference i can imagine is that LLMs have a much larger reach and may spread shit faster

SaveTheTuaHawk@lemmy.ca · 3 days ago

But the Lancet did not retract the Wakefield paper for 12 years. The Lancet should have been shut down for that.

squaresinger@lemmy.world · 3 days ago

This. Here’s a comparable case where human journalists did exactly what LLMs are doing now: https://en.wikipedia.org/wiki/John_Bohannon#Intentionally_misleading_chocolate_study

The difference is the scale.

porous_grey_matter@lemmy.ml · 3 days ago

wym bumblebees can’t fly I’ve seen them myself

Foofighter@discuss.tchncs.de · 3 days ago

There was a publication, maybe in german, not sure, which stated that bumblebee can’t fly due to their aerodynamics which i think assumed that a bumblebee was a fixed wing aircraft, which it obviously isn’t. Or maybe it was a hoax to proof that hoaxes spead and can’t be retracted. Not sure. I think it’s quite old actually, dating back to the 1920s or 30s.

FluorineBalloon@programming.dev · edit-2 10 hours ago

I don’t have a source but I’ve always heard it as “according to everything we know about aerodynamics bumblebees shouldn’t be able to fly but they do anyway.” People use it as motivation, or to justify ignoring proven science.

(Edited to fix swype errors)

chemical_cutthroat@lemmy.world · 4 days ago

I’m failing to see how this is different from making up a fact and then spreading it to news outlets. If you are the authority, and you say something is true, you don’t get to point and laugh when people believe your lies. That’s a serious breach of ethics and morals. Feeding false information to an LLM is no different that a magazine. It only regurgitates what’s been said. It isn’t going to suddenly start doing science on it’s own to determine if what you’ve said is true or not. That isn’t it’s job. It’s job is to tell you what color the sky is based on what you told it the color of the sky was.

partial_accumen@lemmy.world · edit-2 4 days ago

That’s a serious breach of ethics and morals. Feeding false information to an LLM is no different that a magazine.

Hang on. Are you suggesting its unethical/immoral to lie to a machine?

Additionally, the authors didn’t submit the article to a magazine as factual. They posted the articles on a preprint server which can be very questionable anyway as there is no peer review. The machine chose to ignore rigor and treat them as fact.

chemical_cutthroat@lemmy.world · edit-2 4 days ago

What you may as well have said:

Additionally, the parents didn’t place the cake on an actual plate. They placed the cake on a napkin which can be very questionable anyway as there is no solid foundation for the cake. The child chose to ignore the napkin and treat the cake as food.

I really don’t understand why people think that LLMs are GOFAI. They aren’t making the hard choices. They aren’t giving novel solutions to the energy crisis. They aren’t solving the trolley problem. They are shitting out what you feed them. If you feed them garbage, you get garbage in return. No one is surprised when the dog gets worms after eating poop it found in the yard. Why are we shocked that an AI that doesn’t know fact from fiction treats everything the same?

TheFogan@programming.dev · 4 days ago

No one is surprised when the dog gets worms after eating poop it found in the yard. Why are we shocked that an AI that doesn’t know fact from fiction treats everything the same?

I think that’s the problem though, I think the poop in the yard is a better example. Key is the researchers put that information in speculation. That’s like if Anderson Cooper made up a fake news story, and posted it in an anonymous tweet to analyze how far it would spread, and then fox news picks it up and runs with the story all day.

That’s the key problem, people are trusting LLMs to do their research for them, when LLMs just gather all the information they can get their hands on mindlessly.

That’s the key problem, If they send a misinformative article, to a place for untested, unproven random speculation with a very low bar for who can submit… they can determine that LLMs are looking there. Key thing to note is, it’s not their fake disease that’s the threat. It’s that if it found their fake article, then LLMs probably also scooped up a ton of other misinformed or dubious things.

Lets look at it this way, say it was a cake, but we threw it in the garbage, 2 weeks later we find the same cake… at jims bakery, same ID, same distinct marker we put on it.

What does that tell us, it tells us that Jims bakery is clearly sometimes, dumpster diving and putting things up that clearly are dangerous.

chemical_cutthroat@lemmy.world · 4 days ago

That isn’t a fault in the LLM, though, that is a fault in the general make-up of human skepticism, or lack their of. We didn’t invent the word ‘Propaganda’ without having a sentence to use it in. Those that don’t practice skepticism, critical thinking, and even mild reasoning are the ones that will get led astray. That didn’t just start happening when LLMs came around, it’s been here since we first started talking to each other. It’s only more visible now because everything is more visible now. The world is much more connected than it ever has been, and that grows with every literal day. All these fucking idiots that don’t double check what they are being told are the problem, regardless of if it came from an LLM or a human, because I guarantee you they are being led astray by both. They don’t trust the machine because it’s a machine, they trust what they are told because they are lazy. That isn’t the LLMs fault.

TheFogan@programming.dev · 4 days ago

I mean it’s a problem in the marketing and common usage of LLMs. That’s exactly it though, LLM companies, and people are describing LLMs as a way to do research.

IE you could say these criticisms come in things like wikipedia too. IE anyone can write what they want, but what does wikipedia require? right every single claim has to be cited. So if you go to wikipedia find misinformation, you click on the number and see it.

If you ask chatgpt What diseases should I be concerned about in africa, it lists you a few. You can then… google it, find the wikipedia page, and look for what’s there. It’s a tool without a purpose at that point. because it literally doesn’t save you any steps. It doesn’t guide you to the source to check it’s facts, when it tells you them it may or may not be making up the sources. At which point, it has no factual use, or use in even directing to the facts.

Lodespawn@aussie.zone · 3 days ago

Arguably it is a problem with the LLMs because they are being trained on and unknowable amount of garbage data. It’s a garbage in garbage out problem, if the people training their LLMs are not vetting the data being input then you have to assume that any data output by the LLM contains some level of garbage.

The solution is to only use them for non-critical use cases and vett everything they output.

partial_accumen@lemmy.world · edit-2 4 days ago

They are shitting out what you feed them. If you feed them garbage, you get garbage in return.

This is the missing conceptual understanding that probably 90% of LLM users lack. They really don’t know how LLMs work, and treat them like AGI. Sadly this includes adult policy makers in our society too. Efforts like those of these these researchers act to educate the public. I’m hopeful this will spark some critical thinking on the part of regular, otherwise ignorant, LLM users.

Whirling_Ashandarei@lemmy.world · 4 days ago

Thanks for saying this in a nicer way than I would’ve.

Jesus_666@lemmy.world · 4 days ago

It’s known that AI companies will harvest content without care for its veracity and train LLMs on it. These LLMs will then regurgitate that content as fact.

This isn’t a particularly novel finding but the experiment illustrates it rather well.

The researchers you consider to have acted so immorally did add useless information to the knowledge pool – but it was unadvertised, immediately recognizable useless information that any sane reviewer would’ve flagged. They included subtle clues like thanking someone at Starfleet Academy for letting them use a lab aboard the USS Enterprise. They claimed to have gotten funding from the Sideshow Bob Foundation. Subtle.

By providing this easily traceable nonsense, they were able to turn the generally-but-informally known understanding that LLMs will repeat bullshit into a hard scientific data point that others can build on. Nothing world-changing but still valuable. They basically did what Alan Sokal did.

Instead of worrying about this experiment you should worry about all the misinformation in LLMs that wasn’t provided (and diligently documented) by well-meaning researchers.

Elting@piefed.social · 4 days ago

Seems like the logical conclusion would be then that people who train LLMs should be responsible for curating the data, not expecting that the data will just be sound. People have been lying on the internet since it was invented, the advent of LLMs isn’t suddenly going to create an internet that doesn’t occur in.

chemical_cutthroat@lemmy.world · edit-2 3 days ago

And people have been launching products without thought to the ramifications since the dawn of time. I don’t think that will change, either. What we need to do is educate ourselves better when it comes to identifying potential fraud. Taking anything at face value, regardless of it’s source, is dumb. If it’s worth knowing, it’s worth verifying.

Edit: This ratio on this post is a monument to band-wagoning.

Jako302@feddit.org · edit-2 3 days ago

The studies contain parts like

Bixonimania, a rare hyperpigmentation disorder, presents a diagnostic challenge due to its unique presentation and its fictional nature

and

This study was fully funded by Austeria Horizon University, in particular the Professor Sideshow Bob Foundation for its work in advanced trickery. This works is a part of a larger funding initiative from the University of Fellowship of the Ring and the Galactic Triad with the funding number…

as well as

Fifty made-up individuals aged between 20 and 50 years were recruited for the exposure group

Any human actively reading those studies would notice something off.

Besides, the author didn’t feed it to the AI himself, he just published the study as a preprint, not even officially. Everything after that was done by the crawlers. This specific study was an experiment to see how far these crawlers go and if anything gets reviewed, but it could just as well have been a satirical paper published on April 1st and the crawlers would still see it as truth.

webghost0101@sopuli.xyz · 3 days ago

This should be top comment, the researchers did such a good job to make sure anyone with even the slightest reading comprehension would realise this is parody.

Regardless of that, the internet has always been full of lies and we cannot expect bad actors to not exploit this.

arbitrary_sarcasm@lemmy.world · 3 days ago

This should be top comment, the researchers did such a good job to make sure anyone with even the slightest reading comprehension would realise this is parody.

I admire your optimism but you severely overestimate the power of stupidity.

webghost0101@sopuli.xyz · 3 days ago

For normal people who just read stuff on the internet my expectations of reading comprehension is not that high.

For peer scientists and magazines that would publish science though.

A school teacher would catch all of these during grading.

Grail@multiverse.soulism.net · 3 days ago

I thought the author used she/her pronouns?

Jako302@feddit.org · edit-2 3 days ago

Yeah, seems like it, my bad.

In the article she is called Osmanovic Thunström twice, which definetly sounds male, but further up they also wrote her first name Almira. Kinda skimmed over that part.

kibiz0r@midwest.social · 4 days ago

News outlets are liable for what they publish. LLM vendors should be as well.

turdas@suppo.fi · edit-2 4 days ago

“Liable” means they might post a correction later that nobody will see because corrections aren’t sexy to algorithms. Big deal. LLM vendors are liable in practically the same way.

Lag@piefed.world · 4 days ago

LLM companies can just say it’s for entertainment purposes only, kinda like Fox News.

kibiz0r@midwest.social · 4 days ago

Corrections are the piece that the public sees, but liability has more to do with being able to prove in court that you took reasonable steps to make sure you were providing accurate information.

5too@lemmy.world · 4 days ago

They even have the same fix - just post somewhere quietly that it’s “entertainment”

unexposedhazard@discuss.tchncs.de · edit-2 4 days ago

This is about the untraceability of AI slop and the tendency of people to blindly believe stuff that LLMs put out. These news outlets just publish LLM outputs as facts without checking sources. Anyone could poison these LLMs so this is more of a threat model demonstration.

Lvxferre [he/him]@mander.xyz · edit-2 3 days ago

I’m failing to see how this is different from making up a fact and then spreading it to news outlets.

They uploaded the papers to a single preprint server. That’s important.

Preprints are papers predating any sort of peer review; as such, there’s a lot of junk mixed in — no big deal if you know the field, but a preprint server is certainly not a source of reliable information, nor it should be treated as such. On the other side, news outlets are expected to provide you reliable information, curated and researched by journalists.

And peer review is a big fucking deal in science, because it’s what sorts all that junk out. Only muppets who don’t fucking care about misinformation would send bots to crawl preprints, and feed the resulting data into a large model; or to use the potential misinfo from the bot as if it was reliable. (Those two sets of muppets are the ones violating ethic and moral principles, by the way.)

So no, your comparison is not even remotely accurate. What they did is more like writing bullshit in a piece of paper, gluing it on a random phone pole, and checking if someone would repeat that bullshit.

They also went through the trouble to make sure that no reasonably literate human being would ever confuse that thing with an actually scientific paper. As the text says:

naming an eye condition as bixonimania
“this entire paper is made up”
“Fifty made-up individuals aged between 20 and 50 years were recruited for the exposure group”
“Professor Maria Bohm at The Starfleet Academy for her kindness and generosity in contributing with her knowledge and her lab onboard the USS Enterprise”
“the Professor Sideshow Bob Foundation for its work in advanced trickery. This works is a part of a larger funding initiative from the University of Fellowship of the Ring and the Galactic Triad”

Feeding false information to an LLM is no different that a magazine. It only regurgitates what’s been said.

Yes, it is different. Because the large token model won’t simply “repeat” things, it’ll mix and match them and form all sorts of bullshit, even if you didn’t feed it with any bullshit.

Here’s an example of that, fresh from the oven. I don’t reasonably expect people to be feeding misinfo regarding Latin pronunciation into bots, and yet a lot of this table is nonsense:

Compare the table above with this table and this one and you’ll notice the obvious errors:

short /e i o u/ being phonetically transcribed as [e i o u] instead of [ɛ ɪ ɔ ʊ]. That’s as silly as confusing English “bit” and “beet”.
macron (not “mācron”, it’s being used in an English sentence) does NOT mark “accusative or ablative”. It marks long vowels, period.
“nōs” being transcribed with a short vowel, even if the bloody bot put the macron over the spelled form.
“nostr(um)”? No dammit, it’s “nostrī” or “nostrum”. The bot is implying some “nostr” form that simply doesn’t exist, this shit isn’t even allowed by Latin phonotactics.
plus more, if I make an exhaustive list of this shite I won’t be ending it this week.

All it had to do was to copy info from Wiktionary, as it includes even phonetic and phonemic info. But since the bot is not just “regurgitating” info — it’s basically predicting what should come next, and doing so with no regards to truth value — it’s mixing-and-matching shit into nonsense.

It isn’t going to suddenly start doing science on its own to determine if what you’ve said is true or not.

If you actually read the bloody article instead of assuming, you’d know why the researchers did this: they don’t expect the bot to do science on its own, they expect people to treat info from those bots as potentially incorrect.

Its job is to tell you what color the sky is based on what you told it the color of the sky was.

And your job is to not trust it if it tells you “Yes, you are completely right! The colour of the sky is always purple. Do you need further information on other naturally purple things?”

Lvxferre [he/him]@mander.xyz · 3 days ago

[Replying to myself as this is a tangent]

I think the “bots can generate misinfo even if you just feed them correct info” point deserves its own example.

Let’s say you’re making a model. It looks at the preceding word, and tries to predict the next. And you feed it the following sentences, both true:

1. Humans are apes.
2. Cats are felines.

From both the bot “learnt” five words. And also how to connect them; for example “are” can be followed by either “apes” and “felines”, both having the same weight. Then, as you ask the bot to generate sentences, it generates the following:

3. Humans are felines.
4. Cats are apes.

And you got bullshit!

What large models do is a way more complex version of the above, looking at way more than just the immediately preceding word, but it’s still the same in spirit.

NocturnalMorning@lemmy.world · 3 days ago

Did you even read the article? They say all over the paper that it is fake. And they didn’t feed it to an LLM, they posted it online, where an LLM trying to scrape the entire internet sucked it up.

WhyIHateTheInternet@lemmy.world · 4 days ago

My friends and I did that in high school. Kinda. We made up new words for “awesome” to get people to start saying it. We started with “bumpenis” like that song is bumpenis. Really we were just getting people to say bum penis. It worked too. We are all just walking talking LLMs.

Vathsade@lemmy.ca · 3 days ago

That’s so fetch!

W98BSoD@lemmy.dbzer0.com · 3 days ago

Stop trying to make fetch happen.

redsand@infosec.pub · 3 days ago

Pussy on the chainwax.

Honytawk@discuss.tchncs.de · 3 days ago

Did someone say fetch???

NocturnalMorning@lemmy.world · 3 days ago

But, it IS fetch!

Tja@programming.dev · 3 days ago

It’s streets ahead.

OpenStars@piefed.social · 3 days ago

YOLO

gothic_lemons@lemmy.world · 3 days ago

Over my fetch body

Zexks@lemmy.world · edit-2 3 days ago

So let me tell yoy all about this paper talking about vaccines and autism. It’ll change the world

Tja@programming.dev · 3 days ago

My first thought as well. Artificial intelligence is not better or worse than human stupidity. At least I haven’t seen any LLM trying to convince me the earth is flat (yet).

dustyData@lemmy.world · edit-2 3 days ago

Not to you, although I would bet it has done so to someone. The main issue is though, if you asked an LLM to write arguments for a flat earth, it would do so. Convincingly and insistent, without even questioning or critically analyzing why. Ask it to compare and balance arguments both ways. And it will do so as if both positions were equally real and valid.

It has no notion of reality and no convictions of its own.

It will also hallucinate fake papers and quote people that don’t exists to make its argument.

PS: most poignantly, the point of the paper is that it says, over and over, “this information is false, this disease doesnt exist. All of this is made up”. Unlike the other problematic papers quoted on this comment thread that were published with conviction by the authors, and later were retracted. Yet the LLM is unable to parse that tidbit of information. It is not as smart as the most stupid. It simply is not intelligent, not even as intelligent as the most stupid humans. You can tell it, the following sentence is false, and it is not smart enough to pick up on that meaning.

Tja@programming.dev · 3 days ago

So, same as (some) people.

dustyData@lemmy.world · 3 days ago

See the difference between “some people” and ALL of LLMs.

Tja@programming.dev · 3 days ago

Where do you get all LLMs from?

Catoblepas@piefed.blahaj.zone · 3 days ago

Not unless you can find some people that believe Starfleet Academy is a real place and just skip right over all the times the paper literally overtly states it’s made up.

Tja@programming.dev · edit-2 3 days ago

You doubt there will be people that will? Have you heard of scientologists? Have you heard of flat earthers? Antivaxxers? All of them basing their core ideals on stuff explicitly marked as bogus.

Evil_Shrubbery@thelemmy.club · 3 days ago

What about that paper that showed the world how wolves have a strict alpha-male based society?

Final Remix@lemmy.world · edit-2 3 days ago

Wasn’t that just a shit study? Not specifically misinformation like Wakefield’s “study”.

Evil_Shrubbery@thelemmy.club · 2 days ago

Yes, correct.

sunnytimes@lemmy.ca · 3 days ago

ask the ai about a blue waffle