• @MigratingtoLemmy@lemmy.world
    link
    fedilink
    English
    1852 months ago

    If OpenAI can get away with going through copy-righted material, then the answer to piracy is simple: round up a bunch of talented Devs from the internet who are writing and training AI models, and let’s make a fantastic model trained on what the internet archive has. Tell you what, let Mistral’s engineers lead that charge, and put an AGPL license on the project so that companies can’t fuck us over.

    I refuse to believe that nobody has thought of this yet

    • @bandwidthcrisis@lemmy.world
      link
      fedilink
      English
      342 months ago

      An AI trained on old Internet material would be like a synthetic Grandpa Simpson:

      “In my day we said ‘all your base’ and laughed all day long, because it took all day to download the video.”

    • capital
      link
      fedilink
      English
      62 months ago

      We get it, y’all hate LLMs and the companies who make them.

      This comparison is disingenuous and I have to think you’re smart enough to know that, making this disinformation.

      If/when an LLM like ChatGPT spits out a full copy of training text, that’s considered a bug and is remediated fairly quickly. It’s not a feature.

      What IA was doing was sharing the full text as a feature.

      As far as I know, there are some court cases pending regarding determining if companies like Open AI are guilty of copyright infringement but I haven’t seen any convictions yet (happy to be corrected here).

      All that said, I love IA and have a Warrior container scheduled to run nightly to help contribute.

      • @MigratingtoLemmy@lemmy.world
        link
        fedilink
        English
        42 months ago

        Hmm, true. IA wouldn’t be as supported if we couldn’t get the full text of the source.

        Can you tell me more about the “warrior container”?

      • @dan@upvote.au
        link
        fedilink
        English
        1
        edit-2
        2 months ago

        have a Warrior container

        This is an ArchiveTeam project, which is a totally separate effort to the Internet Archive. As far as I know, they’re not related other than the fact that ArchiveTeam use The Internet Archive for storage.

        • capital
          link
          fedilink
          English
          12 months ago

          Ahh my mistake.

          Might be time to financially contribute to IA.

    • @werefreeatlast@lemmy.world
      link
      fedilink
      English
      52 months ago

      Better yet! Train an AI to re-write the books into brand new books and let us read, review the content, add notes etc so that the AI can refresh the books if we find errors.

      Kick the private collections to the curb! Teeth in like in American History X.