A beautiful explanation of what LLMs cannot do. Choice sneer:

If you covered a backhoe with skin, made its bucket look like a hand, painted eyes on its chassis, and made it play a sound like “hnngghhh!” whenever it lifted something heavy, then we’d start wondering whether there’s a ghost inside the machine. That wouldn’t tell us anything about backhoes, but it would tell us a lot about our own psychology.

Don’t have time to read? The main point:

Trying to understand LLMs by using the rules of human psychology is like trying to understand a game of Scrabble by using the rules of Pictionary. These things don’t act like people because they aren’t people. I don’t mean that in the deflationary way that the AI naysayers mean it. They think denying humanity to the machines is a well-deserved insult; I think it’s just an accurate description.

I have more thoughts; see comments.

  • corbin@awful.systemsOP
    link
    fedilink
    English
    arrow-up
    10
    ·
    3 months ago

    This author has independently rediscovered a slice of what’s known as the simulators viewpoint: the opinion that a large-enough language model primarily learns to simulate scenarios. The earliest source that lays out all of the ingredients, which you may want to not click if you’re allergic to LW-style writing or bertology, is a 2022 rationalist rant called Simulators. I’ve summarized it before on Stack Exchange; roughly, LLMs are not agents, oracles, genies, or tools; but general-purpose simulators which simulate conversations that agents, oracles, genies, or tools might have.

    Something about this topic is memetically repulsive. Consider previously, on Lobsters. Or more gently, consider the recent post on a non-anthropomorphic view of LLMs, which is also in the simulators viewpoint, discussed previously, on Lobsters and previously, on Awful. Aside from scratching the surface of the math to see whether it works, folks seem to not actually be able to dig into the substance, and I don’t understand why not. At least here the author has a partial explanation:

    When we personify AI, we mistakenly make it a competitor in our status games. That’s why we’ve been arguing about artificial intelligence like it’s a new kid in school: is she cool? Is she smart? Does she have a crush on me? The better AIs have gotten, the more status-anxious we’ve become. If these things are like people, then we gotta know: are we better or worse than them? Will they be our masters, our rivals, or our slaves? Is their art finer, their short stories tighter, their insights sharper than ours? If so, there’s only one logical end: ultimately, we must either kill them or worship them.

    If we take the simulators viewpoint seriously then the ELIZA effect becomes a more serious problem for society in the sense that many people would prefer to experience a simulation of idealized reality than reality itself. Hyperreality is one way to look at this; another is supernormal stimulus, and I’ve previously explained my System 3 thoughts on this as well.

    There’s also a section of the Gervais Principle on status illegibility; when a person fails to recognize a chatbot as a computer, they become likely to give them bogus legibility-oriented status, and because the depth of any conversation is limited by the depth of the shallowest conversant, they will put the chatbot on a throne, pedestal, or therapist’s recliner above themselves. Symmetrically, perhaps folks do not want to comment because they have already put the chatbot into the lowest tier of social status and do not want to reflect on anything that might shift that value judgement by making its inner reasoning more legible.

    • mountainriver@awful.systems
      link
      fedilink
      English
      arrow-up
      6
      ·
      3 months ago

      general-purpose simulators which simulate conversations that agents, oracles, genies, or tools might have

      Good formulation, but in the spirit of the article I would say “might have had”. Being per definition trained on existing material they can produce likely imitations of conversations that already exists. One would suppose the value of a conversation between oracles and geniuses would be to produce something new, on effect text that is more than the statistically likely output.

      Good article, thanks for linking it.

  • corbin@awful.systemsOP
    link
    fedilink
    English
    arrow-up
    7
    ·
    3 months ago

    I think it’s worth being a little more mathematically precise about the structure of the bag. A path is a sequence of words. Any language model is equivalent to a collection of weighted paths. So, when they say:

    If you fill the bag with data from 170,000 proteins, for example, it’ll do a pretty good job predicting how proteins will fold. Fill the bag with chemical reactions and it can tell you how to synthesize new molecules.

    Yes, but we think that protein folding is NP-complete; it’s not just about which amino acids are in the bag, but the paths along them. Similarly, Stockfish is amazingly good at playing chess, which is PSPACE-complete, partially due to knowing the structure between families of positions. But evidence suggests that NP-completeness and PSPACE-completeness are natural barriers, so that either protein folding has simple rules or LLMs can’t e.g. predict the stock market, and either chess has simple rules or LLMs can’t e.g. simulate quantum mechanics. There’s no free lunch for optimization problems either. This is sort of like the Blockhead argument in reverse; Blockhead can’t be exponentially large while carrying on a real-time conversation, and contrapositively the relatively small size of a language model necessarily represents a compressed simplified system.

    In fact, an early 1600s bag of words wouldn’t just have the right words in the wrong order. At the time, the right words didn’t exist.

    Yeah, that’s Whorfian mind-lock, and it can be a real issue sometimes. However, in practice, people slap together a portmanteau or onomatopoeia and get on with the practice of things. Moreover, Zipf processes naturally reduce the size of words as they are used more, producing a language that is naturally evolved to be within a constant factor of the optimal size. That is, the right words evolve to exist and common words evolve to be small.

    But that’s obvious if we think about paths instead of words. Multiple paths can be equivalent in probability, start and end with the same words, and yet have different intermediate words. Whorfian issues only arise when we lack any intermediate words for any of those paths, so that none of them can be selected.

    A more reasonable objection has to do with the size of definitions. It’s well-known folklore in logic that extension by definition is mandatory in any large body of work because it’s the only way to prevent some proofs from exploding due to combinatorics. LLMs don’t have any way to define one word in terms of other words, whether by macro-clustering sequences or lambda-substituting binders, and they end up learning so much nuance that they are unable to actually respect definitions during inference. This doesn’t matter for humans because we’re not logical or rational, but it stymies any hope that e.g. Transformers, RWKV, or Mamba will produce a super-rational Bayesian Ultron.