

Look, I get your perspective, but zooming out there is a context that nobody’s mentioning, and the thread deteriorated into name-calling instead of looking for insight.
In theory, a training pass needs one readthrough of the input data, and we know of existing systems that achieve that, from well-trodden n-gram models to the wholly-hypothetical large Lempel-Ziv models. Viewed that way, most modern training methods are extremely wasteful: Transformers, Mamba, RWKV, etc. are trading time for space to try to make relatively small models, and it’s an expensive tradeoff.
From that perspective, we should expect somebody to eventually demonstrate that the Transformers paradigm sucks. Mamba and RWKV are good examples of modifying old ideas about RNNs to take advantage of GPUs, but are still stuck in the idea that having a GPU perform lots of gradient descent is good. If you want to critique something, critique the gradient worship!
I swear, it’s like whenever Chinese folks do anything the rest of the blogosphere goes into panic. I’m not going to insult anybody directly but I’m so fucking tired of mathlessness.
Also, point of order: Meta open-sourced Llama so that their employees would stop using Bittorrent to leak it! Not to “keep the rabble quiet” but to appease their own developers.
It’s been frustrating to watch Gutmann slowly slide. He hasn’t slid that far yet, I suppose. Don’t discount his voice, but don’t let him be the only resource for you to learn about quantum computing; fundamentally, post-quantum concerns are a sort of hard read in one direction, and Gutmann has decided to try a hard read in the opposite direction.
Page 19, complaining about lattice-based algorithms, is hypocritical; lattice-based approaches are roughly as well-studied as classical cryptography (Feistel networks, RSA) and elliptic curves. Yes, we haven’t proven that lattice-based algorithms have the properties that we want, but we haven’t proven them for classical circuits or over elliptic curves, either, and we nonetheless use those today for TLS and SSH.
Pages 28 and 29 are outright science denial and anti-intellectualism. By quoting Woit and Hossenfelder — who are sneerable in their own right for writing multiple anti-science books each — he is choosing anti-maths allies, which is not going to work for a subfield of maths like computer science or cryptography. In particular, p28 lies to the reader with a doubly-bogus analogy, claiming that both string theory and quantum computing are non-falsifiable and draw money away from other research. This sort of closing argument makes me doubt the entire premise.