I’m getting tired of repeating this but Language models are incapable of doing math. They generate text which has the appearance of a mathematical explanation, but they have no incentive or reason for it to be accurate.
Hikaru Nakamura tried to play ChatGPT in a game of chess, and it started making illegal moves after 10 moves. When he tried to correct it, it apologized and gave the wrong reason for why the move was illegal, and then it followed up another illegal move. That’s when I knew that LLM’s were just fragile toys.
You can get ChatGPT to play a much better game by telling it to display a representation of the current state of the board after each move, that way it doesn’t lose track of where the pieces are as easily. But still, given that these LLMs were never specifically trained to play chess in the first place it’s amazing how well they do. It’d be like if I set up a chessboard for my dog and 90% of the moves she made were actually valid, rather than simply knocking over all the pieces and then wagging her tail in expectation of the treat she thinks she’s earned by doing so.
That’s an interesting question. Proxima Centauri b is a planet that orbits Proxima Centauri, the closest star to Earth. The planet orbits its parent star at a distance of roughly 4.6 million miles (7.5 million km) with an orbital period of approximately 11.2 Earth days. Proxima Centauri b is only four light years away from Earth.
A solar sail is a spacecraft that uses the pressure of photons from the sun or other light sources to propel itself. The maximum theoretical speed a solar sail could reach is around 420 kilometers per second, or about 1/4 of the speed of light. However, this would require a very large and thin sail, and a close approach to the sun. A more realistic speed for solar sails is 10% of the speed of light, or 67,100,000 mph.
Assuming a solar sail could reach 10% of the speed of light, it would take about 40 years to travel to Proxima Centauri b. However, this does not account for the time it would take to accelerate and decelerate the spacecraft, or the effects of gravity from other bodies in the solar system. A more accurate estimate would require a detailed trajectory analysis and simulation.
I even asked Bing to calculate the time dilation for the person on earth. It answered correctly with the formula and steps shown clearly.
Then it was pulling its calculations directly from a web source, not using generative large language models. I’m not saying a chatbot can’t do this, I’m saying language models can’t do this.
Apparently Bing Chat is able to do some maths. I asked Bing multiple variations of this question based on different speeds of the solar sail (e.g. what if it travels at 50% the speed of light). It was able to calculate both the travel time and the time dilation.
If it is only pulling the answer from web sources, how did it handle the variable speeds?
Your failure in reasoning here is assuming that all of them are purely and only language models. That they receive no other source of learning other than language models – for example, they aren’t fed any kind of pop science math.
It’s clear that this is true of models like ChatGPT, but isn’t the Bing thing powered by GPT4 with a number of other enhancements? Fixing this “can’t do math” thing is a low-hanging fruit for development improvements.
I call them “word calculators.” They take input and generate output, but like a calculator if you don’t know how to check the output you’re gonna have a bad time
I’m getting tired of repeating this but Language models are incapable of doing math. They generate text which has the appearance of a mathematical explanation, but they have no incentive or reason for it to be accurate.
Just like how image models are incapable of doing language, and it ends up looking like nonsense words.
Precisely
Yeah, but it wasn’t a math question. Bard “decided” to make it a math question either way.
Hikaru Nakamura tried to play ChatGPT in a game of chess, and it started making illegal moves after 10 moves. When he tried to correct it, it apologized and gave the wrong reason for why the move was illegal, and then it followed up another illegal move. That’s when I knew that LLM’s were just fragile toys.
It is after all a Large LANGUAGE Model. There’s no real reason to expect it to play chess.
There is. All the general media is calling these LLMs AI and AIs have been playing chess and winning for decades.
Yeah for that we’d need a Gigantic LANGUAGE Model.
You can get ChatGPT to play a much better game by telling it to display a representation of the current state of the board after each move, that way it doesn’t lose track of where the pieces are as easily. But still, given that these LLMs were never specifically trained to play chess in the first place it’s amazing how well they do. It’d be like if I set up a chessboard for my dog and 90% of the moves she made were actually valid, rather than simply knocking over all the pieces and then wagging her tail in expectation of the treat she thinks she’s earned by doing so.
Bing Chat is doing quite well:
I even asked Bing to calculate the time dilation for the person on earth. It answered correctly with the formula and steps shown clearly.
Then it was pulling its calculations directly from a web source, not using generative large language models. I’m not saying a chatbot can’t do this, I’m saying language models can’t do this.
Apparently Bing Chat is able to do some maths. I asked Bing multiple variations of this question based on different speeds of the solar sail (e.g. what if it travels at 50% the speed of light). It was able to calculate both the travel time and the time dilation.
If it is only pulling the answer from web sources, how did it handle the variable speeds?
It’s also possible that Bing’s chatbot is using a math-specific plugin in addition to its websearching plugin.
Your failure in reasoning here is assuming that all of them are purely and only language models. That they receive no other source of learning other than language models – for example, they aren’t fed any kind of pop science math.
It’s clear that this is true of models like ChatGPT, but isn’t the Bing thing powered by GPT4 with a number of other enhancements? Fixing this “can’t do math” thing is a low-hanging fruit for development improvements.
They announced to build in plug ins like for WolframAlpha, so maybe that is where it is pulling this data from.
The issue here isn’t the math, it’s the claim that the solar sail will be traveling at 100 times the speed of light.
But 100 times IS math?
I call them “word calculators.” They take input and generate output, but like a calculator if you don’t know how to check the output you’re gonna have a bad time
You fact check your calculator? Like with another calculator, or do you do it by hand?
I at least have an idea by estimating the result. If I do 10 x 20 and end up with 13.397897 I know something’s wrong.
Maybe you should have a language model repeat it for you. :)