I wondered about the robots.txt
I can see the case for it, I could also see the case for allowing at least Google to index the site.
Has there been some discussion about this previously?
At this point we try to block pretty much everything even remotely related to AI companies.
Soon we will probably have to block Chrome browsers when they start to use them to scrape websites without their users knowing (yes that is why AI companies started to make their own browsers and Mozilla is planning the same proudly proclaiming how “stealthy” they can be with that.).
Google search results have become so useless that I see little point left trying to accomodate their search bot 🤷
Yes I am bitter and can’t wait for the AI bubble to pop.
It’s any day now I think, EU pension funds are moving out https://www.removepaywall.com/search?url=https%3A%2F%2Fwww.ft.com%2Fcontent%2F9d90d557-48e5-4f4b-a927-88071cef8ea9
Would you be up for re-enabling Google indexing? It is crappy, but still…
Not very motivated, but I can look into it.
I think it would just be
User-agent: * Disallow: / User-agent: Googlebot Allow: /Ok I tried to allow-list some search engine spiders in the robot.txt, however they will probably still just run into the AI scraper block if they act too shady.
But honestly, I highly doubt we will get much traffic from Google search. It’s completely gone to shit these days.
There has and IIRC it’s to help prevent scraping.
deleted by creator
In the bottom
User-agent: * Disallow: /Blocks all scrapers and superseeds the text above AFAIK

deleted by creator
I wonder if it was a short term fix that got forgotten about…




