Gihub died for this.

PLEASE
… tests from earlier this year found that AI agents failed to complete tasks up to 70% of the time, making them almost entirely redundant as a workforce replacement tool. At best, they’re a way for skilled employees to be more productive and save time on low-level tasks, but those tasks were already being handed off to lower-level employees. Having an AI do it and fail half the time isn’t exactly a winning alternative.
“AI agents failed to complete tasks up to 70% of the time.”
“Having AI do it and fail half the time”
Did AI write this article too? Fails again at basic math
Could be 70% overall, but only half of the tasks it’s good at.

50% is up to 70%.
Probably the are different use cases and some fail more than others.
Shit, there are more that 2 copilots at Microsoft alone
My employer is all in on Microsoft, copilot is terrible, it can’t even find a word in a document. Cntl+F find’s it no problem.
Now when I have a tech issue that I need an answer for, the bing AI generally gets me a detailed answer on the first try. But it’s my understanding that bings AI is just a reskined ChatGPT.
Copilots numbers likely that high merely because it rides along with Office365… I tried using it a few times, and was completely useless. Even failed at sorting a spreadsheet with a few parameters
I wrote a page long documentation on a project. I asked Copilot to “format it to look nice but do not change a word”… it told me how to make some headings bold (would not do it itself) and what not… that’s the “assist” I got
I couldn’t even copy/paste the format since it’s reply window does not apply and the text it provided was interlaced with its own stupid comments letting me know bold headings are more visible than regular font
I get better results just bouncing ideas off my cats
That was my experience. Wife had work telling her to use it, she asked me for help. I tried to get it to do things and all it would do is suggest stuff that we both knew perfectly well how to do with shortcuts. As for anything complex like have a chat and generate a document: fuck no. Might as well go to chat-jippity and copypasta it’s result and format it yourself. Utter waste of time. I don’t see why it’s there, I can’t find a use-case.
I work with sensitive data… so I often grab a real message, gut all the PHI and refill with some fake data.
I had a project where I needed to do a lot of these, so I got DeepSeek to give me a list of superhero “real names”, DOBs, gender and a few other fake things so I could automate filling these messages with fake data. This is the most success I have had with AI and even then it messed up (minimally) thinking for some reason that Wanda was a guy hahahaha
i think MS is just putting AI into everything, just for the sake of it, they dont really care if its useful or not at this point. they just need to buy time to soften the blow when the bubble bursts.
We put it in everything, and bet all of our money on it to “soften the blow”
yeah it really feels like incompetence rather than a strategy to weather the bubble pop :p
MS is running into a real problem where its two major product lines, Windows and Office, don’t have any major improvements that justify an upgrade. It is an existential crisis to the company’s profitability.
Now, MS has been able to make Office into a subscription, but it can’t do that with Windows.
Did you try asking copilot tips on how to use it properly? Edit* this was a joke. Ai is garbageware that nobody wants, just like ads and capitalism for capital sake. Now untwist your panties.
Yes, I did. Everything I tried on copilot wanted me to upload corporate data to the cloud. (Yeah, NO). It told me it could help with my email…if I uploaded them individually. (Still bad practices here and breaking corp policy).
I expect LLMs should be really good pattern driven activity, but I’ve yet to figure out how to make this useful.
I tried to use a local LLM to summarize outline and discuss my *.md notes for annual review. It sucked at it if it didn’t completely crash the model. It couldn’t even provide a unique list of all tags in the files. (It took me about 30min manually). I thought that it would be good at that. I would have been better off spending to learn find | grep commands or spent time learning python.
I’m still searching, but maybe one day I’ll figure out a use for these.
We use it during down time for amusement to see how badly it can do things. Actually we did, it’s gotten a bit boring as it’s not even good at doing bad. It just sucks. Carrying on the MS tradition.
lol
yeah, it errored out too
Comments in the post contain so much cope.
HP: if your non original printer cartridge fails 7 out of 10 times, is it really a savings?
Microsoft: we want 80% of our work to be handled by these ais that fail 70% of the time.
Also fuck HP.
And Microsoft.
Say you hire an employee and you know he fucks up 50% of his tasks, that means you still have to do 50% of the work PLUS examine 100% of his output in great detail to figure out which 50% he fucked up.
Even if the employee was paid 0, I would want him gone.
What’s with all the shills in that comment section? Yeesh.
There’s a lot of money and a lot of careers riding on this bullshit somehow becoming successful.
That’s like people trying to save the Titanic by bailing water with shot glasses.
This just sounds like not enough people with shot glasses. We should have been bringing more onto the sinking Titanic to help out.
/s
There’s a lot of money and a lot of careers
ridingspent on this bullshit somehow becoming successful.Both can be true
A sentence for the ages.
A timeless sentence.
All of them have exactly 1 post too.
actually 70% of a post.
30%, get it right.
Sorry used copilot for the math
MS middle management trying to save their lucrative jobs.
I hope all the money thrown at this “AI” (misnomer, IMHO - it’s really just extremely overwrought pattern matching) causes at least some significant humbling (if not outright downfall) of some tech giants. I haven’t programmed in a couple decades, and yet even I could tell they weren’t gonna get to AGI offa this crap - I can’t believe how badly some of these supposed techies fell for their own hype.
the correct term is Stochastic Parrot… that is what LLM do. It sound even more cool that AI imho
No its not. They havent been this way for years
There are several dozens of these studies
Still stochastic. Even now they still can’t reliably do repeat tasks
Doesn’t matter. There is no cognition. Just word salads mixed and matched with no possibility of receiving “I don’t know” for a answer.
So they no more use probability to choose next word? I wonder how they do it now
That’s was a remarkably uninsightful way to approach that topic. Please link to more of these “studies”, that one was way too short.
The virgin cited study vs the Chad Ad Hominem
Did you read the study? It’s hilarious. They’re using LLMs to “grade” the number of observed “skills” based on the output of LLMs. They’re using a stochastic parrot to evaluate another stochastic parrot, and concluding that there is some kind of emergent “skill” going on. Sheeeesh. It’d assume the authors of the paper are just having a laugh. But, one thing is for sure, the AI stupidity train keeps chugging along.
When discussing it, I often call it “simulated intelligence”, because at the end of the day that’s what neural networks are.
Edit: only to non-technical people, as simulations are a different thing.
In science fiction I’ve often seen the term VI (Virtual Intelligence) to refer to machines that look intelligent, and could probably pass a Turing test, but aren’t really intelligent (normally VI coexists with actual AI, often used as interfaces, where it would be a waste, or too risky, to use a proper AI).
LLMs look a bit like that, though they’re probably too unreliable to use as an interface for anything important.
IMHO it’s real intelligence, but not artificial. LLMs have been fed virtually everything created by the actual intelligence: humans. Like I said, all they do is execute what is effectively pattern matching (on serious steroids) to distill what humans have created into something more bite-sized.
It’s pattern matching, but it’s not matching intelligently. An intelligence should be able to optimize itself to the task at hand, even before self-improvement. LLMs can’t select relevant data to operate on, nor can it handle executive functions.
LLMs are cool, and I think humans have something similar to process information with, but that’s just one part of a larger system, and it’s not the intelligent part.
Ask an LLM how many 'r’s are in the word ‘strawberry’ and tell me it has any actual intelligence behind its output.
To be fair to LLMs they get the text as a series of tokens, so when you type strawberry, the see 🍓 or something. What works better as a counterexample are variations of the river crossing puzzle changed to be trivial.















