• corbin@awful.systems
    link
    fedilink
    English
    arrow-up
    3
    ·
    8 hours ago

    Nah, it’s more to do with stationary distributions. Most tokens tend to move towards it; only very surprising tokens can move away. (Insert physics metaphor here.) Most LLM architectures are Markov, so once they get near that distribution they cannot escape on their own. There can easily be hundreds of thousands of orbits near the stationary distribution, each fixated on a simple token sequence and unable to deviate. Moreover, since most LLM architectures have some sort of meta-learning (e.g. attention) they can simulate situations where part of a simulation can get stuck while the rest of it continues, e.g. only one chat participant is stationary and the others are not.