• drathvedro@lemm.ee
    link
    fedilink
    arrow-up
    64
    ·
    4 months ago

    Whenever I stumble on reddit I make sure to post disinformation or some kind of dumb shit to throw a wrench into the LLM training data they sell to google.

            • Comment105@lemm.ee
              link
              fedilink
              arrow-up
              2
              ·
              edit-2
              4 months ago

              Yeah, usually made a couple, lost those as well, gave them a while, and I’d be able to make one I could keep again. Not sure if they had anyone manually looking at it.

              Haven’t had a main account for years. Never maintained multiple alts either. Just constantly replacing.

              I still engaged a lot in earnest, but I sometimes leaned heavily on sarcasm and became incredibly flippant about the site and all the people on it. Really stopped valuing keeping my account.

              I noticed that every time I commented on anything in r/pcgaming, I think it was, I was banned quickly. I think some subreddits do their own “security” seemed pretty fast and consistent, maybe automatic. Talking about a few hours or so.

      • answersplease77@lemmy.world
        link
        fedilink
        arrow-up
        7
        ·
        4 months ago

        I just got another one today for “harassment” of zionism in r/worldnews. Reddit cannot hold a free discussion and they know it. They can’t even let you speak to expose their bullshit, and permaban you when you do.

        I can you show you the comment which got me banned. I was literally asking questions which they know the answer for but censor intentionally because they are bought and controlled by awful groups directly linked to the IDF themselves. They have a division who train and employe teens as stupid Hasbara trolls who don’t know history and unable to hold a discussion.

        • Comment105@lemm.ee
          link
          fedilink
          arrow-up
          7
          ·
          4 months ago

          My first one was in response to some rich cunt who went on air to say more or less that poverty was a good thing because then people had something to strive to avoid, so I said something along the lines of “This guy should be shot, I’m not even exaggerating.”

      • ClamDrinker@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        4 months ago

        I hate to ruin this for you, but if you post nonsense, it will get downvoted by humans and excluded from any data set (or included as examples of what to avoid). If it’s not nonsensical enough to be downvoted, it still won’t do well vote wise, and will not realistically poison any data. And if it’s upvoted… it just might be good data. That is why Reddit’s data is valuable to Google. It basically has a built in system for identifying ‘bad’ data.

        • leftzero@lemmynsfw.com
          link
          fedilink
          arrow-up
          1
          ·
          4 months ago

          No, you’re missing the point. You make up some credible misinformation to poison AI training with, but you don’t stop there: you get an LLM to rewrite it for you. Retry until you get a text that sounds credible, doesn’t particularly look written by AI, and people will upvote, and post that.

          With this, even if the text looks good, you’re not only poisoning future models with the misinformation you started with; by feeding them a text generated by an LLM (even if you can’t tell the difference at first glance) you’re introducing feedback into the model that will also poison it, not with misinformation, but by reinforcing its biases and errors and reducing the variety of its training data.

          • ClamDrinker@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            4 months ago

            I think I got the point just fine… you’re wasting a ton of electricity and potentially your own money on making text that is not bad training data. Which is exactly what I said would happen.

            LLMs are made of billions of lines of text, the last we know is for GPT3 with sources ranging from 570 GB to 45 TB of text. A short reddit comment is quite literally a drop in a swimming pool. It’s word prediction ability isnt going to change for the worse if you just post a readable comment. It will simply reinforce it.

            And sure you can lie in it, but LLM are trained on fiction as well and have to deal with that as well. There are supplementary techniques they apply to make the AI less prone to hallucinations that dont involve the training data, such as RLHF (Reinforcement learning from humans). But honestly speaking the truth is a dumb thing they try to use the AI for anyways. Its primary function has always been to predict words, not truth.

            You would have to do this at such a scale and so succesfully voting wise that by that time you are significantly represented in the data to poison it you are either dead, banned, bankrupt, excluded from the data, or Google will have moved on from Reddit.

            If you hate or dislike LLMs and want to stop them, let your voice be known. Talk to people about it. Convincing one person succesfully will be worth more than a thousand reddit comments. Poisoning the data directly is a thing, but it’s essentially impossible to inflict alone. It’s more a consequence of bad data gathering, bad storage practice, and bad training. None of those are in your control through a reddit comment.

    • Omniraptor@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 months ago

      this is an ancient and noble practice known as shitposting, no need to call it something else :)