A contractor for Immigration and Customs Enforcement (ICE) and many other U.S. government agencies has developed a tool that lets analysts more easily pull a target individual’s publicly available data from a wide array of sites, social networks, apps, and services across the web at once, including Bluesky, OnlyFans, and various Meta platforms, according to a leaked list of the sites obtained by 404 Media. In all the list names more than 200 sites that the contractor, called ShadowDragon, pulls data from and makes available to its government clients, allowing them to map out a person’s activity, movements, and relationships.

Article archived at https://archive.is/xJcrm

Alternate archive at https://web.archive.org/web/20250312132300/https://www.404media.co/the-200-sites-an-ice-surveillance-contractor-is-monitoring/

List of sites at https://docs.google.com/spreadsheets/d/1VyAaJaWCutyJyMiTXuDH4D_HHefoYxnbGL9l02kyCus/edit?ref=404media.co&gid=0#gid=0

List archived at https://archive.is/k2icM

  • lemmylommy@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    8 months ago

    Tesseract OCR is Open Source Software. How can it be a site that they steal information from?

    • gAlienLifeform@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      8 months ago

      Good question I don’t have the answer to. I could speculate that this is all likely being sourced from some sort of marketing material that ShadowDragon put out where they just flatly say they’re gathering this information from Tesseract, and in reality they’re actually gathering any information they can on users who search for this software and download this software, but like I said I’m speculating.

      If you’re really interested, I would say you should email the author of this article, reach out to Tesseract’s development team, or find a way to get a subpoena against ShadowDragon and/or ICE

    • Benjaben@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      7 months ago

      I hope you’ll update us if you chase this down. I like 404 Media and I want to keep liking them, but only if the reporting is good. Hopefully it’s a typical tech journalism mistranslation where they use Tesseract OCR to scrape PDFs and the author just misunderstood, or something like that.

      Edit: after looking, I don’t have any issues. Looks like just a raw list from whatever source, I don’t need 404 Media to try to “curate” that or remove elements that seem irrelevant, they can leave that to us.