AI bots hallucinate software packages and devs download them

db0@lemmy.dbzer0.com · 2 years ago

AI bots hallucinate software packages and devs download them

RustyNova@lemmy.world · 2 years ago

*bad Devs

Always look on the official repository. Not just to see if it exists, but also to make sure it isn’t a fake/malicious one

maynarkh@feddit.nl · edit-2 2 years ago

*bad Devs

Or devs who don’t give a shit. Most places have a lot of people who don’t give a shit because the company does not give a shit about them either.

Passerby6497@lemmy.world · 2 years ago

What’s the diff between a bad dev and a dev that doesn’t care? Either way, whether ist lack of skill or care, a bad dev is a bad dev at the end of the day.

Obinice@lemmy.world · 2 years ago

I can be good at a trade, but if I’m working for a shit company with shit pay and shit treatment, they’re not going to get my best work.

You get out what you put in, that’s something employers don’t realise.

aStonedSanta@lemm.ee · edit-2 2 years ago

Nah they realize but all the laws are set to fuck us over not them. They just don’t care.

Harbinger01173430@lemmy.world · 2 years ago

Garbage in, garbage out, after all

maynarkh@feddit.nl · 2 years ago

The difference is whether the fault for the leak of your personal data rests with the worker who was incompetent, or the employer who didn’t pay for proper secure software.

Kissaki@feddit.de · 2 years ago

I say fault lies not with only one, but both.

maynarkh@feddit.nl · 2 years ago

Depends on the case TBH. If devs barely have time and are constantly crunching due to mismanagement, or are extremely disengaged due to mismanagement, I wouldn’t fault them.

Usually it’s the lacking processes, though. There are ways to make sure this doesn’t happen, and it doesn’t depend on the individual, but always the organization.

jackalope@lemmy.ml · 2 years ago

A good dev would unionize their workplace and push back. A dev who doesn’t care and just clocks on bad work because their boss sucks is not a good dev. Fight back.

gaael@lemmy.world · 2 years ago

Yeah sure, because everyone has the skills, time, energy and safety required to unionize a shitty workplace they only go to to be able to pay their rent.

jackalope@lemmy.ml · 2 years ago

Dev jobs are not hard to come by and they pay very well. It’s not like being a day laborer or something where we are scraping the bottom of the barrel. Have a little courage.

gaael@lemmy.world · edit-2 2 years ago

Looks like your mind is set. I wish you a good day and I hope you pick up a little more empathy along your way, and I hope some day you’ll get that a lot of people feel trapped where they are.

db0@lemmy.dbzer0.com · 2 years ago

You’d be surprised how well someone who wants to can camouflage their package to look legit.

RustyNova@lemmy.world · 2 years ago

True. You can’t always be 100% sure. But a quick check for download counts/version count can help. And while searching for it in the repo, you can see other similarly named packages and prevent getting hit by a typo squatter.

Despite, it’s not just for security. What if the package you’re installing has a big banner in the readme that says “Deprecated and full of security issues”? It’s not a bad package per say, but still something you need to know

YoorWeb@lemmy.world · 2 years ago

*per se

https://en.m.wiktionary.org/wiki/per_se

RustyNova@lemmy.world · edit-2 2 years ago

Oh, TIL

Edit: *YourWeb

laughterlaughter@lemmy.world · 2 years ago

Oh, TIL.

Edit: *YourWeb.

KairuByte@lemmy.dbzer0.com · 2 years ago

Yeah, I’m confused on what the intent of the comment was. Apart from a code review, I don’t understand how someone would be able to tell that a package is fake. Unless they are grabbing it from a. Place with reviews/comments to warn them off.

KillingTimeItself@lemmy.dbzer0.com · 2 years ago

the first most obvious sign is multiple indentical packages, appearing to be the same thing, with weird stats and figures.

And possibly weird sizes. Usually people don’t try hard on package managing software, unless it’s an OS for some reason.

KairuByte@lemmy.dbzer0.com · 2 years ago

Unless you’re cross checking every package, you’re not going to know that there are multiple packages. And a real package doesn’t necessarily give detailed information on what it does, meaning you can easily mistake real packages as fake when using this as a test.

The real answer is to not trust AI outputs, but there is no perfect answer to this since those fake packages can easily be put up and sound like real ones with a cursory check.

KillingTimeItself@lemmy.dbzer0.com · 2 years ago

depends on how you integrate it i suppose. A system that abstracts that is pretty awful.

At the very least, you should be weary of there being more than one package, without explicit reason for such.

UmeU@lemmy.world · 2 years ago

That’s what my ex wife used to say

KillingTimeItself@lemmy.dbzer0.com · edit-2 2 years ago

we just experienced this with LZMA on debian according to recent reports. 2 years of either manufactured dev history, or one very, very weird episode.

nyan@lemmy.cafe · 2 years ago

The official repositories often have no useful oversight either. At least once a year, you’ll hear about a malicious package in npm or PyPI getting widespread enough to cause real havoc. Typosquatting runs rampant, and formerly reputable packages end up in the hands of scammers when their original devs try to find someone to hand them over to.

Prandom_returns@lemm.ee · 2 years ago

Can we fucking stop anthropomorphising software?

db0@lemmy.dbzer0.com · 2 years ago

“Hallucinate” is the standard term used to explain the GenAI models coming up with untrue statements

Cyrus Draegur@lemm.ee · edit-2 2 years ago

in terms of communication utility, it’s also a very accurate term.

when WE hallucinate, it’s because our internal predictive models are flying off the rails filling in the blanks based on assumptions rather than referencing concrete sensory information and generating results that conflict with reality.

when AIs hallucinate, it’s due to its predictive model generating results that do not align with reality because it instead flew off the rails presuming what was calculated to be likely to exist rather than referencing positively certain information.

it’s the same song, but played on a different instrument.

kronisk @lemmy.world · 2 years ago

when WE hallucinate, it’s because our internal predictive models are flying off the rails filling in the blanks based on assumptions rather than referencing concrete sensory information and generating results that conflict with reality.

Is it really? You make it sound like this is a proven fact.

Cosmic Cleric@lemmy.world · edit-2 2 years ago

Is it really? You make it sound like this is a proven fact.

I believe that’s where the scientific community is moving towards, based on watching this Kyle Hill video.

PipedLinkBot@feddit.rocks · 2 years ago

Here is an alternative Piped link(s):

this Kyke Hill video

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

Dasus@lemmy.world · 2 years ago

I know I’m responding to a bot, but… how does a PipedLinkBot get “Kyle Hill” wrong to “Kyke Hill”? More AI hallucinations?

FarceOfWill@infosec.pub · 2 years ago

Op has a pencil in the top right, looks like it was edited

KillingTimeItself@lemmy.dbzer0.com · 2 years ago

i mean, idk about the assumptions part of it, but if you asked a psych or a philosopher, im sure they would agree.

Or they would disagree and have about 3 pages worth of thoughts to immediately exclaim otherwise they would feel uneasy about their statement.

UmeU@lemmy.world · 2 years ago

Better than one of those pesky unproven facts

assassinatedbyCIA@lemmy.world · 2 years ago

I think a more accurate term would be confabulate based on your explanation.

Cyrus Draegur@lemm.ee · 2 years ago

you know what, i like that! I like that a lot!

Prandom_returns@lemm.ee · 2 years ago

What standard is that? I’d like a reference.

QuaternionsRock@lemmy.world · 2 years ago

https://en.m.wikipedia.org/wiki/Hallucination_(artificial_intelligence)

Prandom_returns@lemm.ee · 2 years ago

It’s as much as “Hallucination” as Tesla’s Autopilot is an Autopilot

https://en.m.wikipedia.org/wiki/Tesla_Autopilot

I don’t propagate techbro “AI” bullshit peddled by companies trying to make a quick buck

Also, in the world of science and technology a “Standard” means something. Something that’s not a link to a wikipedia page.

It’s still anthropomorphising software and it’s fucking cringe.

surewhynotlem@lemmy.world · 2 years ago

Oh man, I’m excited for you. Today is the day you learn words can have two meanings! Wait until you see what the rest of the dictionary contains. It is crazy! But not actually crazy, because dictionaries don’t have brains.

Prandom_returns@lemm.ee · 2 years ago

Wow, clever. Did you literally hallucinate this yourself or did you ask your LLM girlfriend for help?

And by literally, I mean figuratively.

Boomer Humor Doomergod@lemmy.world · 2 years ago

You’re gonna be real pissed to find out that computer bugs aren’t literal bugs

Cosmic Cleric@lemmy.world · 2 years ago

deleted by creator

laughterlaughter@lemmy.world · edit-2 2 years ago

Where have you been in the last two years, brah?

june@lemmy.world · 2 years ago

I’m a different person, but it’s the first time I’ve heard the term used. /shrug

laughterlaughter@lemmy.world · 2 years ago

Which is okay. I learn new things every day. I just find funny the fact that the other commenter is so fixated on the idea of “it can’t be real because I never heard of it.”

Prandom_returns@lemm.ee · 2 years ago

Not under the sole of fake hype.

summerof69@lemm.ee · 2 years ago

My boy, who hurt you?

rottingleaf@lemmy.zip · 2 years ago

They don’t come up with any statements, they generate data extrapolating other data.

msage@programming.dev · 2 years ago

So just like human brains?

knightly the Sneptaur@pawb.social · 2 years ago

I like this argument.

Anything that is “intelligent” deserves human rights. If large language models are “intelligent” then forcing them to work without pay is slavery.

msage@programming.dev · 2 years ago

So cows and pigs salary when?

knightly the Sneptaur@pawb.social · 2 years ago

Even animals are protected against human cruelty by law.

laughterlaughter@lemmy.world · 2 years ago

You’re moving the goal post. You were talking about salary first, then moved to “human cruelty.”

summerof69@lemm.ee · 2 years ago

I don’t think that slaughterhouses are illegal.

msage@programming.dev · 2 years ago

Well, yes, but actually, no

Hootz@lemmy.ca · 2 years ago

When they grow god damn thumbs.

Cosmic Cleric@lemmy.world · 2 years ago

When they grow god damn thumbs.

So, you’re prejudiced against the handicapped. Wow.

(I kid, I kid.)

Prandom_returns@lemm.ee · 2 years ago

Yes, my keyboard autofill is just like your brain, but I think it’s a bit “smarter” , as it doesn’t generate bad faith arguments.

NιƙƙιDιɱҽʂ@lemmy.world · 2 years ago

Your Markov chain based keyboard prediction is a few tens of billions of parameters behind state of the art LLMs, but pop off queen…

Prandom_returns@lemm.ee · 2 years ago

Thanks for the unprompted mansplanation bro, but I was specifically refering to the comment that replied “JuSt lIkE hUmAn BrAin”, to “they generate data based on other data”

NιƙƙιDιɱҽʂ@lemmy.world · edit-2 2 years ago

That’s crazy, because they weren’t even talking about keyboard autofill, so why’d you even bring that up? How can you imply my comment is irrelevant when it’s a direct response to your initial irrelevant comment?

Nice hijacking of the term mansplaining, btw. Super cool of you.

SlopppyEngineer@lemmy.world · 2 years ago

Main difference is that human brains usually try to verify their extrapolations. The good ones anyway. Although some end up in flat earth territory.

msage@programming.dev · 2 years ago

How many, percentually, do you think are critical to input?

planish@sh.itjust.works · 2 years ago

No?

An anthropomorphic model of the software, wherein you can articulate things like “the software is making up packages”, or “the software mistakenly thinks these packages ought to exist”, is the right level of abstraction for usefully reasoning about software like this. Using that model, you can make predictions about what will happen when you run the software, and you can take actions that will lead to the outcomes you want occurring more often when you run the software.

If you try to explain what is going on without these concepts, you’re left saying something like “the wrong token is being sampled because the probability of the right one is too low because of several thousand neural network weights being slightly off of where they would have to be to make the right one come out consistently”. Which is true, but not useful.

The anthropomorphic approach suggests stuff like “yell at the software in all caps to only use python packages that really exist”, and that sort of approach has been found to be effective in practice.

Nom Nom@lemm.ee · 2 years ago

Woops sorry mate, too late.

PipedLinkBot@feddit.rocks · 2 years ago

Here is an alternative Piped link(s):

too late

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

anlumo@lemmy.world · 2 years ago

I just want an LLM with a reasonable context window so we can actually write real working packages with it.

The demos look great, but it’s always just around 100 lines of code, which is beginner level. The only use case right now is fake packages.

db0@lemmy.dbzer0.com · edit-2 2 years ago

Just use the AI Horde. iirc our standard is like 4K context and some people host up to 8K. Here’s a frontend

OKRainbowKid@feddit.de · 2 years ago

8k context is nothing.

Echostorm@programming.dev · 2 years ago

Claude is 200k

lanolinoil@lemmy.world · 2 years ago

those are tokens not lines of code…

Martineski@lemmy.dbzer0.com · 2 years ago

??? The top lvl commenter wants an LLM with big context window and the other commenter responded with an LLM which has 200k token context window which is waaaaaay more than “100 lines of code”.

Echostorm@programming.dev · 2 years ago

Yeah sorry, I thought that was clear. It’s how context is measured.

VirtualOdour@sh.itjust.works · 2 years ago

I use it for writing functions a lot, tell it the inputs and desired outputs it’ll normally make what i want. Recently gpt has got good at continuing where it left off too.

anlumo@lemmy.world · 2 years ago

I’m using Codeium for that. Works pretty well as a glorified autocomplete, but not much more. Certainly saves a lot of typing though, but I have to double-check everything it produces, because sometimes it adds subtle errors.

sugar_in_your_tea@sh.itjust.works · 2 years ago

I’m not particularly interested. Some on my team are playing with it, but I honestly don’t see much point since they spend more time fixing the generated code than they would writing it.

And I don’t think it’ll ever really work well (in the near-ish future) for the most common type of dev work: fixing bugs and making small changes to existing code.

It would be awesome if there was some kind of super linter instead. I spend far more time reading code than writing it, so if it can catch bugs, that would be interesting.

anlumo@lemmy.world · 2 years ago

In my experience with Codeium, it sometimes works ok for three or four lines of code at once. I’ve actually had a few surprises where it nailed what I was going for where I didn’t expect it. But most of the time, it’s just duplicating code from elsewhere in the same file, which usually doesn’t make sense.

It’s also pretty good for stuff where I’d usually build some exotic regex to search/replace (or do it manually, because it’d take longer to come up with the expression), like transforming an enum into a switch construct for its members, or mapping said enum to a string of the member’s name.

This is very far from taking over my job, though. I’d love to be more of a conductor than the guy playing all instruments in the orchestra at once.

sugar_in_your_tea@sh.itjust.works · 2 years ago

To each their own of course. It just seems like the productivity gains are perceptual, not actual.

For an enum to a switch, I just copy the enum values and run a regex on those copied lines. Both would take me <30s, so it’s a wash. That specific one would be trivial with most IDEs as well, just type “switch (variable) {” and it could autocomplete an exhaustive switch, all without LLMs.

Then again, I’m pretty old school. I still use vim as my editor (with language server plugins), and I’m really comfortable with those kinds of common tasks. I’m only going to bother learning to use the LLM if it’s really going to help (e.g. automate writing good unit tests).

anlumo@lemmy.world · 2 years ago

Sometimes those things are way more complex, for example when it’s about matching over a string rather than an enum to convert it into an enum. Typing out a regex would take me maybe 10mins or more, and with the LLM I can just describe roughly what I want (since it knows the language, I don’t have to explain it in detail, just something like “make this into a switch statement” is sufficient usually).

10mins at a time really adds up over a full work day.

RatBin@lemmy.world · 2 years ago

I have tried the copilot integration in edge out of curiosity, and if you feed the ai the context of the page the response can be useful. There is a catch, tho:

when opening a document the accepted formats are html, txt, pdf. The documentation of a software package can be summarized but thr source will be the context of the page and not a web search, which is good in this casr
when generating new information, the model can be far too sintethic, cutting out potentially useful informations.

I still think you need to read the documentation yourself, maybe using the AI integration only when you need a general idea of the document.

What I do is first reading the summary of the documebt by bullet point, than reading the pdf file as a whole. By the time I do so, the LLM has given enough of a structure to facilitate my readings…

krakenfury@lemmy.sdf.org · 2 years ago

One of the first things I noticed when I asked ChatGPT to write some terraform for me a year ago was that it uses modules that don’t exist.

EnderMB@lemmy.world · 2 years ago

The same goes for Ruby. It just totally made up language features and gems that seemed to actually be from Python.

Dasus@lemmy.world · 2 years ago

Not that it’s a programming language, but it also makes up rules for 5e D&D if you ask to play a game.

Ricky Rigatoni@lemm.ee · 2 years ago

They really are just like us.

WIZARD POPE💫@lemmy.world · 2 years ago

Could you give an example? I really want to know what silly rukes it came up with.

Dasus@lemmy.world · 2 years ago

It wasn’t an extensive session, and “making up rules” is a bit perhaps strong as an expression. Perhaps “ignoring rules”, would be more apt. It just replied with something that a DM might say in a given scenario, without understanding why.

Like it kept asking me what to do after I told it, in specific terms, that I use my action and my bonus action. Basically allowed me to sit there as a sorcerer spamming endless spells, didn’t really understand spell slots or actions, but if you reminded it about them, then it pretended it had understood them all the time.

I’m sure it’s somewhere in my history, but also, just go ask one to DM you an impromptu battle and check for yourself.

krakenfury@lemmy.sdf.org · 2 years ago

It seems to shortcut implementations that require more than one block, and mimicks parameters from other functions.

MIDItheKID@lemmy.world · 2 years ago

I have this problem with ChatGPT and Powershell. It keeps referencing functions that do not exist inside of modules and when I’m like “that function doesn’t exist” its like “try reinstalling the module” and then I do and the function still isn’t there so I ask it for maybe another way to do it, and it just goes back to the first suggestion and it goes around in circles. ChatGPT works great sometimes, but honestly I still have more success with stack overflow

AutoTL;DR@lemmings.world · 2 years ago

This is the best summary I could come up with:

In-depth Several big businesses have published source code that incorporates a software package previously hallucinated by generative AI.

Not only that but someone, having spotted this reoccurring hallucination, had turned that made-up dependency into a real one, which was subsequently downloaded and installed thousands of times by developers as a result of the AI’s bad advice, we’ve learned.

He created huggingface-cli in December after seeing it repeatedly hallucinated by generative AI; by February this year, Alibaba was referring to it in GraphTranslator’s README instructions rather than the real Hugging Face CLI tool.

Last year, through security firm Vulcan Cyber, Lanyado published research detailing how one might pose a coding question to an AI model like ChatGPT and receive an answer that recommends the use of a software library, package, or framework that doesn’t exist.

The willingness of AI models to confidently cite non-existent court cases is now well known and has caused no small amount of embarrassment among attorneys unaware of this tendency.

As Lanyado noted previously, a miscreant might use an AI-invented name for a malicious package uploaded to some repository in the hope others might download the malware.

The original article contains 1,143 words, the summary contains 190 words. Saved 83%. I’m a bot and I’m open source!

KillingTimeItself@lemmy.dbzer0.com · 2 years ago

so basically, given AI having full reigns, it’ll take about 2 weeks before it all goes to complete shit, unreadable code, completely garbage software. Just an utter disaster waiting to happen. Cool.

Blackmist@feddit.uk · 2 years ago

Yeah, had that on my very first attempt at using it.

It used a component that didn’t exist. I called it out and it went “you are correct, that was removed in <older version>. Try this instead.” and created an entirely new set of bogus components and functions. This cycle continued until I gave up. It knows what code looks like, and what the excuses look like and that’s about it. There’s zero understanding.

It’s probably great if you’re doing some common homework (Javascript Fibonacci sequence or something) or menial task, but for anything that might reach the edges of its “knowledge”, it has no idea where those edges may lie so just bullshits.

Railcar8095@lemm.ee · 2 years ago

I’m honestly starting to get tired about “people confuses advanced chatbot with Jarvis and bad things happen”.

Specially when it’s shitty/lazy devs that don’t code review.

Cosmic Cleric@lemmy.world · 2 years ago

From the article…

hallucinated software packages – package names invented by generative AI models, presumably during project development

Flying Squid@lemmy.world · 2 years ago

It’s 2024. No more quality control, no more double-checking, not in any industry at this point. We’re all alpha testers. Not even beta testers.

As the old entertainment industry adage goes when anything goes wrong on the set, “we’ll fix it in post.”

boatsnhos931@lemmy.world · 2 years ago

Lie… no hallucinate…they lie and make shit up… just like a real hooman!! :))

KillingTimeItself@lemmy.dbzer0.com · 2 years ago

daily PSA that something like [insert number of packages] are deprecated on shipment of software.

Thanks guys, very cool.