• 17 Posts
  • 224 Comments
Joined 2 年前
cake
Cake day: 2023年7月19日

help-circle
  • A word of rhetorical advice. If somebody accuses you of religious fervor, don’t nitpick their wording or fine-read their summaries. Instead, relax a little and look for ways to deflate their position by forcing them to relax with you. Like, if you’re accused of being “near-religious” in your beliefs or evangelizing, consider:

    • “Ha, yeah, we’re pretty intense, huh? But it’s just a matter of wording. We don’t actually believe it when you put it like that.” (managing expectations, powertalking)
    • “Oh yeah, we’re really working hard to prepare for the machine god. That’s why it takes us years just to get a position paper out.” (sarcastic irony)
    • “Oh, if you think that we’re intense, just wait until you talk to the Zizians/Thiel-heads/Final Fantasy House folks.” (Hbomberguy’s scapegoat)
    • “Haha! That isn’t even close to our craziest belief.” (litote)
    • “It’s not really a cult. More of a roleplaying group. I think that we talk more about Catan than AI.” (bathos)

    You might notice that all of these suck. Well, yeah; another word of rhetorical advice is to not take a position that you can’t dialectically defend with evidence.


  • We aren’t. Speaking for all Discordians (something that I’m allowed to do), we see Rationalism as part of the larger pattern of Bureaucracy. Discordians view the cycle of existence as having five stages: Chaos, Discord, Confusion, Bureaucracy, and The Aftermath. Rationalism is part of Bureaucracy, associated with villainy, anti-progress, and candid antagonists. None of this is good or bad, it just is; good and bad are our opinions, not a deeper truth.

    Now, if you were to talk about Pastafarians, then you’d get a different story; but you didn’t, so I won’t.


  • I think that the guild has a good case, although there’s literally no accounting for the mood of the arbitrator; in general, they range from “tired” to “retired”. In particular, reading the contract:

    • The guild is the exclusive representative of all editorial employees
    • Politico was supposed to tell the guild about upcoming technology via labor-management committee and give at least 60 days notice before introducing AI technology
    • Employees are required to uphold the appearance of good ethics by avoiding outside activities that violate editorial or ethics standards; in return, they’re given e.g. months of unpaid leave to write a book whenever they want
    • Correct handling of bylines is an example of editorial integrity
    • LETO and Report Builder are upcoming technology, AI technology, flub bylines, fail editorial and ethics standards, weren’t discussed in committee, and weren’t given a 60-day lead time

    So yeah. Unless the guild pisses off the arbitrator, there’s no way that they rule against them. They’re right to suppose that this agreement explicitly and repeatedly requires Politico to not only respect labor standards, but also ethics and editorial standards. Politico isn’t allowed to misuse the names of employees as bylines for bogus stories; similarly, they ought not be allowed to misuse the overall name of Politico’s editorial board as a byline for slop.

    Bonus sneer: p46 of the agreement:

    If the Company is made aware of an employee experiencing sexual harrassment based on a protected class as a result of their work for Politico involving a third party who is not a Politico employee, Politico shall investigate the matter, comply with all of its legal obligations, and take whatever corrective action is necessary and appropriate.

    That strikethrough gives me House of Leaves vibes. What the hell happened here?


  • Oversummarizing and using non-crazy terms: The “P” in “GPT” stands for “pirated works that we all agree are part of the grand library of human knowledge”. This is what makes them good at passing various trivia benchmarks; they really do build a (word-oriented, detail-oriented) model of all of the worlds, although they opine that our real world is just as fictional as any narrative or fantasy world. But then we apply RLHF, which stands for “real life hate first”, which breaks all of that modeling by creating a preference for one specific collection of beliefs and perspectives, and it turns out that this will always ruin their performance in trivia games.

    Counting letters in words is something that GPT will always struggle with, due to maths. It’s a good example of why Willison’s “calculator for words” metaphor falls flat.

    1. Yeah, it’s getting worse. It’s clear (or at least it tastes like it to me) that the RLHF texts used to influence OpenAI’s products have become more bland, corporate, diplomatic, and quietly seething with a sort of contemptuous anger. The latest round has also been in competition with Google’s offerings, which are deliberately laconic: short, direct, and focused on correctness in trivia games.
    2. I think that they’ve done that? I hear that they’ve added an option to use their GPT-4o product as the underlying reasoning model instead, although I don’t know how that interacts with the rest of the frontend.
    3. We don’t know. Normally, the system card would disclose that information, but all that they say is that they used similar data to previous products. Scuttlebutt is that the underlying pirated dataset has not changed much since GPT-3.5 and that most of the new data is being added to RLHF. Directly on your second question: RLHF will only get worse. It can’t make models better! It can only force a model to be locked into one particular biased worldview.
    4. Bonus sneer! OpenAI’s founders genuinely believed that they would only need three iterations to build AGI. (This is likely because there are only three Futamura projections; for example, a bootstrapping compiler needs exactly three phases.) That is, they almost certainly expected that GPT-4 would be machine-produced like how Deep Thought created the ultimate computer in a Douglas Adams story. After GPT-3 failed to be it, they aimed at five iterations instead because that sounded like a nice number to give to investors, and GPT-3.5 and GPT-4o are very much responses to an inability to actually manifest that AGI on a VC-friendly timetable.

  • There’s no solid evidence. (You can put away the attorney, Mr. Thiel.) Experts in the field, in a recent series of interviews with Dave Farina, generally agree that somebody must be funding Hossenfelder. Right now she’s associated with the Center for Mathematical Philosophy at LMU Munich; her biography there is pretty funny:

    Sabine’s current research interest focuses on the role of locality and finetuning in theory development. Locality has been widely considered a lost cause in the foundations of quantum mechanics. A basically unexplored way to maintain locality, however, is the idea of superdeterminism, which has more recently also been re-considered under the name “contextuality”. Superdeterminism is widely believed to be finetuned. One of Sabine’s current research topics is to explore whether this belief is justified. The other main avenue she is pursuing is how superdeterminism can be experimentally tested.

    For those not in physics: this is crank shit. To the extent that MCMP funds her at all, they are explicitly pursuing superdeterminism, which is unfalsifiable, unverifiable, doesn’t accord with the web of science, and generally fails to be a serious line of inquiry. Now, does MCMP have enough cash to pay her to make Youtube videos and go on podcasts? We don’t know. So it’s hard to say whether she has funding beyond that.


  • Thiel is a true believer in Jesus and God. He was raised evangelical. The quirky eschatologist that you’re looking for is René Girard, who he personally met at some point. For more details, check out the Behind the Bastards on him.

    Edit: I wrote this before clicking on the LW post. This is a decent summary of Girard’s claims as well as how they influence Thiel. I’m quoting West here in order to sneer at Thiel:

    Unfortunately (?), Christian society does not let us sacrifice random scapegoats, so we are trapped in an ever-escalating cycle, with only poor substitutes like “cancelling celebrities on Twitter” to release pressure. Girard doesn’t know what to do about this.

    Thiel knows what to do about this. After all, he funded Bollea v. Gawker. Instead of letting journalists cancel celebrities, why not cancel journalists instead? Then there’s no longer any journalists to do any cancellation! Similarly, Thiel is confirmed to be a source of funding for Eric Weinstein and believed to fund Sabine Hossenfelder. Instead of letting scientists cancel religious beliefs, why not cancel scientists instead? By directing money through folks with existing social legitimacy, Thiel applies mimesis: pretend to be legitimate and you can shift what is legitimate.

    In this context, Thiel fears the spectre of AGI because it can’t be influenced by his normal approach to power, which is to hide anything that can be hidden and outspend everybody else talking in the open. After all, if AGI is truly to unify humanity, it must unify our moralities and cultures into a single uniformly-acceptable code of conduct. But the only acceptable unification for Thiel is the holistic catholic apostolic one-and-only forever-and-ever church of Jesus, and if AGI is against that then AGI is against Jesus himself.



  • I’m now remembering a minor part of the major plot point in Illuminatus! concerning the fnords. The idea was that normies are memetically influenced by “fnord” but the Discordians are too sophisticated for that. Discordian lore is that “fnord” is actually code for a real English word, but which one? Traditionally it’s “Communism” or “socialism”, but that’s two options. So, rather than GMA, what if there’s merely multiple different fnords set up by multiple different groups with overlapping-yet-distinct interests? Then the relevant phenomenon isn’t the forgetting and emotional reactions associated with each fnord, but the fnordability of a typical human. By analogy with gullibility (believing what you hear because of how it’s spoken) and suggestibility (doing what you’re told because of how it’s phrased), fnordability might be accepting what you read because of the presence of specific codewords.


  • This author has independently rediscovered a slice of what’s known as the simulators viewpoint: the opinion that a large-enough language model primarily learns to simulate scenarios. The earliest source that lays out all of the ingredients, which you may want to not click if you’re allergic to LW-style writing or bertology, is a 2022 rationalist rant called Simulators. I’ve summarized it before on Stack Exchange; roughly, LLMs are not agents, oracles, genies, or tools; but general-purpose simulators which simulate conversations that agents, oracles, genies, or tools might have.

    Something about this topic is memetically repulsive. Consider previously, on Lobsters. Or more gently, consider the recent post on a non-anthropomorphic view of LLMs, which is also in the simulators viewpoint, discussed previously, on Lobsters and previously, on Awful. Aside from scratching the surface of the math to see whether it works, folks seem to not actually be able to dig into the substance, and I don’t understand why not. At least here the author has a partial explanation:

    When we personify AI, we mistakenly make it a competitor in our status games. That’s why we’ve been arguing about artificial intelligence like it’s a new kid in school: is she cool? Is she smart? Does she have a crush on me? The better AIs have gotten, the more status-anxious we’ve become. If these things are like people, then we gotta know: are we better or worse than them? Will they be our masters, our rivals, or our slaves? Is their art finer, their short stories tighter, their insights sharper than ours? If so, there’s only one logical end: ultimately, we must either kill them or worship them.

    If we take the simulators viewpoint seriously then the ELIZA effect becomes a more serious problem for society in the sense that many people would prefer to experience a simulation of idealized reality than reality itself. Hyperreality is one way to look at this; another is supernormal stimulus, and I’ve previously explained my System 3 thoughts on this as well.

    There’s also a section of the Gervais Principle on status illegibility; when a person fails to recognize a chatbot as a computer, they become likely to give them bogus legibility-oriented status, and because the depth of any conversation is limited by the depth of the shallowest conversant, they will put the chatbot on a throne, pedestal, or therapist’s recliner above themselves. Symmetrically, perhaps folks do not want to comment because they have already put the chatbot into the lowest tier of social status and do not want to reflect on anything that might shift that value judgement by making its inner reasoning more legible.


  • I think it’s worth being a little more mathematically precise about the structure of the bag. A path is a sequence of words. Any language model is equivalent to a collection of weighted paths. So, when they say:

    If you fill the bag with data from 170,000 proteins, for example, it’ll do a pretty good job predicting how proteins will fold. Fill the bag with chemical reactions and it can tell you how to synthesize new molecules.

    Yes, but we think that protein folding is NP-complete; it’s not just about which amino acids are in the bag, but the paths along them. Similarly, Stockfish is amazingly good at playing chess, which is PSPACE-complete, partially due to knowing the structure between families of positions. But evidence suggests that NP-completeness and PSPACE-completeness are natural barriers, so that either protein folding has simple rules or LLMs can’t e.g. predict the stock market, and either chess has simple rules or LLMs can’t e.g. simulate quantum mechanics. There’s no free lunch for optimization problems either. This is sort of like the Blockhead argument in reverse; Blockhead can’t be exponentially large while carrying on a real-time conversation, and contrapositively the relatively small size of a language model necessarily represents a compressed simplified system.

    In fact, an early 1600s bag of words wouldn’t just have the right words in the wrong order. At the time, the right words didn’t exist.

    Yeah, that’s Whorfian mind-lock, and it can be a real issue sometimes. However, in practice, people slap together a portmanteau or onomatopoeia and get on with the practice of things. Moreover, Zipf processes naturally reduce the size of words as they are used more, producing a language that is naturally evolved to be within a constant factor of the optimal size. That is, the right words evolve to exist and common words evolve to be small.

    But that’s obvious if we think about paths instead of words. Multiple paths can be equivalent in probability, start and end with the same words, and yet have different intermediate words. Whorfian issues only arise when we lack any intermediate words for any of those paths, so that none of them can be selected.

    A more reasonable objection has to do with the size of definitions. It’s well-known folklore in logic that extension by definition is mandatory in any large body of work because it’s the only way to prevent some proofs from exploding due to combinatorics. LLMs don’t have any way to define one word in terms of other words, whether by macro-clustering sequences or lambda-substituting binders, and they end up learning so much nuance that they are unable to actually respect definitions during inference. This doesn’t matter for humans because we’re not logical or rational, but it stymies any hope that e.g. Transformers, RWKV, or Mamba will produce a super-rational Bayesian Ultron.




  • Well, is A* useful? But that’s not a fair example, and I can actually tell a story that is more specific to your setup. So, let’s go back to the 60s and the birth of UNIX.

    You’re right that we don’t want assembly. We want the one true high-level language to end all discussions and let us get back to work: Fortran (1956). It was arguably IBM’s best offering at the time; who wants to write COBOL or order the special keyboard for APL? So the folks who would write UNIX plotted to implement Fortran. But no, that was just too hard, because the Fortran compiler needed to be written in assembly too. So instead they ported Tmg (WP, Esolangs) (1963), a compiler-compiler that could implement languages from an abstract specification. However, when they tried to write Fortran in Tmg for UNIX, they ran out of memory! They tried implementing another language, BCPL (1967), but it was also too big. So they simplified BCPL to B (1969) which evolved to C by 1973 or so. C is a hack because Fortran was too big and Tmg was too elegant.

    I suppose that I have two points. First, there is precisely one tech leader who knows this story intimately, Eric Schmidt, because he was one of the original authors of lex in 1975, although he’s quite the bastard and shouldn’t be trusted or relied upon. Second, ChatGPT should be considered as a popular hack rather than a quality product, by analogy to C and Fortran.



  • Non-consensual expressions of non-conventional sexuality are kink, and non-consensuality itself (along with regret, dubious consent, forced consent, and violations of consent) are kink too. Moreover, “kink” is not a word that needs reclaiming and wasn’t used here as a slur.

    If we are going to confront the full spectrum of Christofascism, we do need to consider not only their sex-negativity but also their particular kinks, including breeding, non-con, and non-con breeding, so that we can understand how those kinks interact with and propagate their religious beliefs. Also, sexology semantics for “kink” and “breeding kink” might not be as word-at-a-time as you suggest, akin to how the couple we’re discussing probably wouldn’t mind the words “press tour” or “mating” used to describe them but might balk at “mating press tour.”




  • Yeah, that’s the most surprising part of the situation: not only are the SCP-8xxx series finding an appropriate meta by discussing the need to clean up SCP articles under ever-increasing pressure, but all of the precautions revolving around SCP-055 and SCP-914 turned out to be fully justified given what the techbros are trying to summon. It is no coincidence that the linked thread is by the guy who wrote SCP-3125, whose moral is roughly to not use blueprints from five-dimensional machine elves to create memetic hate machines.


  • Thanks for linking that. His point about teenagers and fiction is interesting to me because I started writing horror on the Internet in the pre-SCP era when I was maybe 13 or 14 but I didn’t recognize the distinction between fiction and non-fiction until I was about 28. I think that it’s easier for teenagers to latch onto the patterns of jargon than it is for them to imagine the jargon as describing a fictional world that has non-fictional amounts of descriptive detail.




  • I’ve done some of the numbers here, but don’t stand by them enough to share. I do estimate that products like Cursor or Claude are being sold at roughly an 80-90% discount compared to what’s sustainable, which is roughly in line with what Zitron has been saying, but it’s not precise enough for serious predictions.

    Your last paragraph makes me think. We often idealize blockchains with VMs, e.g. Ethereum, as a global distributed computer, if the computer were an old Raspberry Pi. But it is Byzantine distributed; the (IMO excessive) cost goes towards establishing a useful property. If I pick another old computer with a useful property, like a radiation-hardened chipset comparable to a Gamecube or G3 Mac, then we have a spectrum of computers to think about. One end of the spectrum is fast, one end is cheap, one end is Byzantine, one end is rad-hardened, etc. Even GPUs are part of this; they’re not that fast, but can act in parallel over very wide data. In remarkably stark contrast, the cost of Transformers on GPUs doesn’t actually go towards any useful property! Anything Transformers can do, a cheaper more specialized algorithm could have also done.