Back in college I took a couple machine learning classes. After the second I understood where the market would eventually end up. It’s a pattern matching machine, if you were to provide infinite data and infinite compute, you could have the machine do enough regression to match the presented surface of whatever the data represented.
I sat back and was like “oh this is sorta cool but sorta dumb”. You can’t create a novel thought process from this, you are limited by what data you can collect and label, and labeling data is an extremely time consuming process because it relies on humans. But also, you don’t really need labeled data if you don’t care about correctness. You can get away with feeding the regression machine a load of data and label more generally based on how close certain points are in a vector space. It’s how “sentiment analysis” works. You can take IMDB’s database of reviews that each have some words and a star rating, and use the star rating to categorize in “good” and “bad”, then average out the distance between certain words and the frequency they appear within the “good”, “mid”, and “bad” spectrum.
Suddenly, relationships show up, “lawyer” is close to “criminal” in the 3d space.
What modern LLMs do, is just layer this same system with a few short-circuits called context windows. It basically maintains a space of relevance within the broader context of what the model was trained in. For the IMDB example, lets say you’re asking a machine about action movies with x, y, z characteristics, the context maintains those to short-circuit the larger model to retain ‘focus’ and give you near-by relationships.
With enough data you can recreate language based on distance markers and frequency. But back to my original point, it’s the surface level of what it was trained on, a plaster mask. The mask doesn’t have the complexity of the muscles and skin it was formed on, it’s shallow.
That all being said, the ability to make a shallow mask is useful for cross-referencing large amounts of data. The disaster strikes when it’s treated as an all knowing god and used to do military strikes.
The disaster strikes when it’s treated as an all knowing god and used to do military strikes.
Or fire people.
We gave computers a stack of transparencies and told it to pick a couple at random to make something new and we pretended it was smart. And now people really be thinking humans aren’t needed for production anymore.
Back in college I took a couple machine learning classes. After the second I understood where the market would eventually end up. It’s a pattern matching machine, if you were to provide infinite data and infinite compute, you could have the machine do enough regression to match the presented surface of whatever the data represented.
I sat back and was like “oh this is sorta cool but sorta dumb”. You can’t create a novel thought process from this, you are limited by what data you can collect and label, and labeling data is an extremely time consuming process because it relies on humans. But also, you don’t really need labeled data if you don’t care about correctness. You can get away with feeding the regression machine a load of data and label more generally based on how close certain points are in a vector space. It’s how “sentiment analysis” works. You can take IMDB’s database of reviews that each have some words and a star rating, and use the star rating to categorize in “good” and “bad”, then average out the distance between certain words and the frequency they appear within the “good”, “mid”, and “bad” spectrum.
Suddenly, relationships show up, “lawyer” is close to “criminal” in the 3d space.
What modern LLMs do, is just layer this same system with a few short-circuits called context windows. It basically maintains a space of relevance within the broader context of what the model was trained in. For the IMDB example, lets say you’re asking a machine about action movies with x, y, z characteristics, the context maintains those to short-circuit the larger model to retain ‘focus’ and give you near-by relationships.
With enough data you can recreate language based on distance markers and frequency. But back to my original point, it’s the surface level of what it was trained on, a plaster mask. The mask doesn’t have the complexity of the muscles and skin it was formed on, it’s shallow.
That all being said, the ability to make a shallow mask is useful for cross-referencing large amounts of data. The disaster strikes when it’s treated as an all knowing god and used to do military strikes.
Or fire people.
We gave computers a stack of transparencies and told it to pick a couple at random to make something new and we pretended it was smart. And now people really be thinking humans aren’t needed for production anymore.