Lemmy.one
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Ignacio@lemmy.ml to Technology@lemmy.ml · 2 years ago

Disallow GPTBot to access your site

platform.openai.com

external-link
message-square
13
fedilink
  • cross-posted to:
  • privacyguides
  • lemmy_ca_support@lemmy.ca
  • technews@radiation.party
  • hackernews@derp.foo
120
external-link

Disallow GPTBot to access your site

platform.openai.com

Ignacio@lemmy.ml to Technology@lemmy.ml · 2 years ago
message-square
13
fedilink
  • cross-posted to:
  • privacyguides
  • lemmy_ca_support@lemmy.ca
  • technews@radiation.party
  • hackernews@derp.foo
OpenAI Platform
platform.openai.com
external-link
Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.
alert-triangle
You must log in or # to comment.
  • Eager Eagle@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    2 years ago

    Or don’t do anything. There are plenty of crawlers out there and disallowing won’t stop the unethical ones.

    • Zikeji@programming.dev
      link
      fedilink
      English
      arrow-up
      25
      ·
      2 years ago

      Just because some people might break into my house doesn’t mean I’ll stop locking my doors.

      • Eager Eagle@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        ·
        2 years ago

        that doesn’t lock anything, it’s not a security feature.

        • Arakwar@kbin.social
          link
          fedilink
          arrow-up
          10
          ·
          2 years ago

          A house door lock isn’t that much about security either.

          • Zikeji@programming.dev
            link
            fedilink
            English
            arrow-up
            5
            ·
            2 years ago

            It’s a deterrent. Which is a pretty apt comparison for robots.txt and user agent blocking.

      • CaptainAniki@lemmy.flight-crew.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 years ago

        deleted by creator

        • 5BC2E7@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          2 years ago

          That is the point. They don’t need to be secure to work as a deterrent

  • Voyajer@kbin.social
    link
    fedilink
    arrow-up
    13
    ·
    edit-2
    2 years ago

    “Please label all of your interesting text so we can flag it with our webcrawler to train on later.”

  • WasPentalive
    link
    fedilink
    arrow-up
    9
    ·
    edit-2
    2 years ago

    Is there some way you could have your web server log who scrapes the site? If you disallow ChatGPT and still find that it has scraped your site would you have cause to sue? @legaleagle (or anyone else too)

    • Cyclohexane@lemmy.ml
      link
      fedilink
      arrow-up
      8
      ·
      2 years ago

      It’s gotta be pretty difficult to differentiate human users from bots. If it was easy, you could prevent bots from loading the page altogether.

      • Lmaydev@programming.dev
        link
        fedilink
        arrow-up
        4
        ·
        2 years ago

        Exactly what Google are trying to do currently. Just in the worst way possible.

  • HousePanther@lemmy.goblackcat.com
    link
    fedilink
    English
    arrow-up
    7
    ·
    2 years ago

    I’m going to do that tomorrow for my blog site. There’s no way I am letting ChatGPT crawl my shit.

    • snooggums@kbin.social
      link
      fedilink
      arrow-up
      20
      ·
      2 years ago

      Narrator: ChatGPT crawled it anyway.

  • ExpensiveConstant@kbin.social
    link
    fedilink
    arrow-up
    6
    ·
    2 years ago

    I mean, you can add their user agent to the robots file but the crawler could just change their user agent or even ignore the robots file if the server isn’t filtering requests by user agent

Technology@lemmy.ml

technology@lemmy.ml

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !technology@lemmy.ml

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 81 users / day
  • 500 users / week
  • 2.69K users / month
  • 7.18K users / 6 months
  • 613 local subscribers
  • 38.1K subscribers
  • 3.45K Posts
  • 47.3K Comments
  • Modlog
  • mods:
  • MinutePhrase@lemmy.ml
  • BE: 0.19.7
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org