Epstein Files Jan 30, 2026 Release - Archived from Justice.gov

xodoh74984@lemmy.world · edit-2 2 天前

Epstein Files Jan 30, 2026 Release - Archived from Justice.gov

o_derr889@lemmy.world · 1 小时前

Here is the download link for a text file that has all the original URL’s https://wormhole.app/PpjJ3P#SFfAOKm1bnCyi-h2YroRyA The link will only last for 24 hours.

acelee1012@lemmy.world · 1 分钟前

I have never made a torrent file before so feel free to correct me if it doesn’t work. Here is the magnet link for this as a torrent file so its up for more than an hour magnet:?xt=urn:btih:694535d1e3879e899a53647769f1975276723db7&xt=urn:btmh:12207cf818f0f0110ca5e44614f2c65e016eca2fe7bc569810f9fb25e80ff608fc9b&dn=DOJ%20Epstein%20file%20urls.txt&xl=81991719&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

acelee1012@lemmy.world · 13 小时前

Has anyone made a Dataset 9 and 10 torrent file without the files in it that the NYT reported as potentially CSAM?

activeinvestigator@lemmy.world · 13 小时前

Do people here have the partial dataset 9? or are you all missing the entire set? There is a magnet link floating around for ~100GB of it, the one removed in the OP

I am trying to figure out exactly how many files dataset 9 is supposed to have in it. Before the zip file went dark, I was able to download about 2GB of it. This was today, maybe not the original zip file from jan 30th In the head of the zip file is an index file, VOL00009.OPT, you don’t need the full download in order to read this index file. The index file says there are 531,307 pdfs the 100GB torrent has 531,256, it’s missing 51 pdfs. I checked the 51 file names and they no longer exist as individual files on the DOJ website either. I’m assuming these are the CSAM.

note that the 3M number of released documents != 3M pdfs. each pdf page is counted as a “document”. dataset 9 contains 1,223,757 documents, and according to the index, we are missing only 51 documents, they are not multipage. In total, I have 2,731,789 documents from datasets 1-12, short of the 3M number. the index I got also was not missing document ranges

it’s curious that the zip file had an extra 80GB when only 51 documents are missing. I’m currently scraping links from the DOJ webpage to double check the filenames

Arthas@lemmy.world · edit-2 10 小时前

i analyzed with AI my 36gb~ that I was able to download before they erased the zip file from the server.

Complete Volume Analysis

  Based on the OPT metadata file, here's what VOL00009 was supposed to contain:

  Full Volume Specifications

  - Total Bates-numbered pages: 1,223,757 pages
  - Total unique PDF files: 531,307 individual PDFs
  - Bates number range: EFTA00039025 to EFTA01262781
  - Subdirectory structure: IMAGES\0001\ through IMAGES\0532\ (532 folders)
  - Expected size: ~180 GB (based on your download info)

  What You Actually Got

  - PDF files received: 90,982 files
  - Subdirectories: 91 folders (0001 through ~0091)
  - Current size: 37 GB
  - Percentage received: ~17% of the files (91 out of 532 folders)

  The Math

  Expected:  531,307 PDF files / 180 GB / 532 folders
  Received:   90,982 PDF files /  37 GB /  91 folders
  Missing:   440,325 PDF files / 143 GB / 441 folders

  ★ Insight ─────────────────────────────────────
  You got approximately the first 17% of the volume before the server deleted it. The good news is that the DAT/OPT index files are complete, so you have a full manifest of what should be there. This means:
  - You know exactly which documents are missing (folders 0092-0532)

I haven’t looked into downloading the partials from archive.org yet to see if I have any useful files that archive.org doesn’t have yet from dataset 9.

Wild_Cow_5769@lemmy.world · 13 小时前

thats pretty cool…

Can you send me a DM of the 51? if i come across one and it isnt some sketchy porn i’ll let u know

GorillaCall@lemmy.world · 13 小时前

I have heard its 186gb

BWint@lemmy.world · 13 小时前

The BBC is now reporting that “thousands” of documents have been removed because the DOJ improperly redacted information that can be used to identify the victims: https://www.bbc.com/news/articles/cn0k65pnxjxo

TavernerAqua@lemmy.world · edit-2 16 小时前

In regard to Dataset 9, it’s currently being shared on Dread (forum).

I have no idea if it’s legit or not, and Idc to find out after reading about what’s in it from NYT.

Wild_Cow_5769@lemmy.world · 14 小时前

where… I dont see it here https://dreadytognbh7m5nlmqsogzzlxjy75iuxkulewbhxcorupbqahact2yd.onion/

DigitalForensick@lemmy.world · edit-2 15 小时前

this dude on pastebin posted his filetree in his epstein ubuntu env - i have a high confidence in whatever lives in his DataSet9Complete.zip file haha

Wild_Cow_5769@lemmy.world · 7 小时前

No doubt. High confidence…. :)

Wild_Cow_5769@lemmy.world · 17 小时前

@wild_cow_5769:matrix.org If someone has a group working on finding the dataset.

There are billions of people on earth. Someone downloaded dataset 9 before the link was taken down. We just have to find them :)

Wild_Cow_5769@lemmy.world · 17 小时前

Someone mentioned a matrix group. Can they DM and invite me. I want to help. Thx

kutt@lemmy.world · 7 小时前

Count me in!

DigitalForensick@lemmy.world · 17 小时前

same

DigitalForensick@lemmy.world · 20 小时前

Holy shit

The entire Court Records and FOIA page is completely gone too! Fuckers!

DigitalForensick@lemmy.world · 18 小时前

Have a scraper running on web.archive.org pulling all previously posted Court-Records and FOIA (docs,audio,etc.) from Jan 30th

Wild_Cow_5769@lemmy.world · 18 小时前

I told you…

We need dataset 9…

DigitalForensick@lemmy.world · 22 小时前

While I feel hopeful that we will be able to reconstruct the archive and create some sort of baseline that can be put back out there, I also cant stop thinking about the “and then what” aspect here. We’ve see our elected officials do nothing with this info over and over again and I’m worried this is going to repeat itself.

I’m fully open to input on this, but I think having a group path forward is useful here. These are the things I believe we can do to move the needle.

Right Now:

Create a clean Data Archive for each of the known datasets (01-12). Something that is actually organized and accessible.
Create a working Archive Directory containing an “itemized” reference list (SQL DB?) the full Data Archive, with each document’s listed as a row with certain metadata. Imagining a Github repo that we can all contribute to as we work. – File number – Dir. Location – File type (image, legal record, flight log, email, video, etc.) – File Status (Redacted bool, Missing bool, Flagged bool
Infill any MISSING records where possible.
Extract images away from .pdf format, Breakout the “Multi-File” pdfs, renaming images/docs by file number. (I made a quick script that does this reliably well.)
Determine which files were left as CSAM and “redact” them ourselves, removing any liability on our part.

What’s Next: Once we have the Archive and Archive Directory. We can begin safely and confidently walking through the Directory as a group effort and fill in as many files/blanks as possible.

Identify and dedact all documents with garbage redactions, (remember the copy/paste DOJ blunders from December) & Identify poorly positioned redaction bars to uncover obfuscated names
LABELING! If we could start adding labels to each document in the form of tags that contain individuals, emails, locations, businesses - This would make it MUCH easier for people to “connect the dots”
Event Timeline… This will be hard, but if we can apply a timeline ID to each document, we can put the archive in order of events
Create some method for visualizing the timeline, searching, or making connection with labels.

We may not be detectives, legislators, or law men, but we are sleuth nerds, and the best thing we can do is get this data in a place that can allow others to push for justice and put an end to this crap once and for all. Its lofty, I know, but enough is enough. …Thoughts?

PeoplesElbow@lemmy.world · 14 小时前

We definitely need a crowdsourced method for going through all the files. I am currently building a solo cytoscape tool to try out making an affiliation graph, but expanding this to be a tool for a community, with authorization to just allow whitelisted individuals work on it, that’s beyond my scope and I can’t volunteer to make such an important tool, but I am happy to offer my help building it. I can convert my existing tool to a prototype if anyone wants to collaborate with me on it. I am an amateur, but I will spend all the Cursor Credits on this.

Wild_Cow_5769@lemmy.world · edit-2 21 小时前

GFD….

My 2 cents. As a father of only daughters…

If we don’t weed out this sick behavior as a society we never will.

My thoughts are enough is enough.

Once the files are gone there is little to 0 chance they are ever public again….

You expect me to believe that a “oh shit we messed up” was accident?

It’s the perfect excuse… so no one looks at the files.

That’s my 2 cents.

DigitalForensick@lemmy.world · 1 分钟前

I’ve been thinking a lot about this whole thing. I don’t want to be worried or fearful here - we have done nothing wrong! Anything we have archived was provided to us directly by them in the first place. There are whispers all over the internet, random torrents being passed around, conspiracies, etc., but what are we actually doing other than freaking ourselves out (myself at least) and going viral with an endless stream of “OMG LOOK AT THIS FILE” videos/posts.

I vote to remove any of the ‘concerning’ files and backfill with blank placeholder PDFS with justification, then collect everything we have so far, create file hashes, and put out a clean + stable archive on everything we have so far. a safe indexed archive We wipe away any concerns and can proceed methodically through blood trail of documents, resulting in an obvious and accessible collection of evidence. From there we can actually start organizing to create a tool that can be used to crowd source tagging, timestamping, and parsing the data. I’m a developer and am happy to offer my skillset.

Taking a step back - Its fun to do the “digital sleuth” thing for a while, but then what? We have the files…(mostly)… Great. We all have our own lives, jobs, and families, and taking actual time to dig into this and produce a real solution that can actually make a difference is a pretty big ask. That said, this feels like a moment where we finally can make an actual difference and I think its worth committing to. If any of you are interested in helping beyond archival, please lmk.

I just downloaded matrix, but I’m new to this, so I’m not sure how that all works. Happy to link up via discord, matrix, email, or whatever.

kongstrong@lemmy.world · edit-2 1 天前

PSA: paging bug has been fixed on the DOJ’s website. Website caps out at around 9600 for ~197k files, way less than the 520k in the less-complete dataset 9 torrent. Scraping the website now to find out which files they took offline.

Correction: 9600*50 files per page is in the 470k ballpark. Much more tan 197k but still a lot less than the torrent’s 530k let alone the expected 600k+ files that were supposed to be in there

lukky@lemmy.zip · 24 小时前

can you explain to me what is the problem exact ? its the dataset9 ? when i press the dataset9 link of the DOJ gov i see a download start with 180gb zip file in the browser. ?

kongstrong@lemmy.world · 24 小时前

yea for me it fails after anywhere between 200MB and 10-15GB. All the time.

Wild_Cow_5769@lemmy.world · 23 小时前

Same. Every damn time.

lukky@lemmy.zip · 23 小时前

And what is the solution ?

Wild_Cow_5769@lemmy.world · 21 小时前

F if I know I’ve been messing with it for days. I’ve tried chunking. Different scripts. Different cookies.

kongstrong@lemmy.world · 22 小时前

we’re working on some more complex solutions in an Element group. Not really sure where we stand at this moment, but it seems we can stitch a lot together from the large torrent files and by what we scraped from the DOJs website through a little bit of force.

Wild_Cow_5769@lemmy.world · 21 小时前

Are u doing after DS9?

kongstrong@lemmy.world · 21 小时前

trying

captainmycaptain@lemmy.world · 22 小时前

Check Available Pieces for the torrents. My guess is that you’ll see half of them are missing and UNavailable.

TurtleGreen@lemmy.world · 1 天前

where did the party move?

Wild_Cow_5769@lemmy.world · 23 小时前

It’s still here. No one dropped a complete dataset 9 yet tho…

BWint@lemmy.world · 23 小时前

Hasn’t moved AFAIK, just going slowly.

Wild_Cow_5769@lemmy.world · edit-2 23 小时前

This entire thing smells funny. Even OP turned ghost on the threat of suspect images that no one has seen…

Ask yourself. How did the times or whoever came up with this narrative even find these “suspect” images in a few hours when it seems no one in the world came even download the zip…

kutt@lemmy.world · edit-2 15 小时前

A person made a website just to host links and thumbnails for a better interface to the videos on the DoJ website.

They deleted everything including their account the same day.

Everyone. I know website is showing all blank. This is unfortunately the end of my little project. Due to certain circumstances, I had to take it down. Thank you everyone for supporting me and my effort.

Edit: Link

Wild_Cow_5769@lemmy.world · 14 小时前

Link is dead

kutt@lemmy.world · 11 小时前

It still works for me. I can only see the comments on the post since it was deleted, but that’s what’s important here.

OP’s last message

Scary Comment 1

Scary Comment 2

Arthas@lemmy.world · 21 小时前

some bad news, it looks like the data 9 zip file link doesn’t work anymore. They appear to have removed the file so my download stopped at 36gb. I’m not familiar with their site so is this normal for them to remove the files and maybe put them back again once they’ve reorganized them and at the same link location? or are we having to do the scrape of each pdf like another user has been doing?

Wild_Cow_5769@lemmy.world · 21 小时前

All the zip files are gone on the DOJ website. The links are gone.

DigitalForensick@lemmy.world · 20 小时前

Does anyone have the OTHER data sets from before? Ive been lasered in on the DS1-DS12 but havent looked at the other documents at all

DigitalForensick@lemmy.world · 20 小时前

this is ridiculous. Good thing we got in when we did!

Wild_Cow_5769@lemmy.world · 16 小时前

Need dataset 9

kuuhaku@lemmy.world · 16 小时前

me too, u know any place to get?

Wild_Cow_5769@lemmy.world · 21 小时前

All the zip download links are gone on the DOJ website.

It’s only a matter of time before all the files just go poof.

acelee1012@lemmy.world · edit-2 21 小时前

is anyone else having issues getting dataset 10 11* to start downloading? it has been sittiing at 0 percent for a day while everything else is done and seeding. it shows connections to peers, rechecking does nothing, deleting and re-adding does nothing, asking tracker for more peers does nothing

Nomad64@lemmy.world · edit-2 19 小时前

I have been seeding all of the datasets since Sunday. The copy of set 9 has been the busiest, with set 10 a distant second. I plan on seeding them for quite a while yet, and also picking up a consolidated torrent when that becomes available. Hopefully you are able to get connected via the Swarm.

acelee1012@lemmy.world · 14 小时前

is there something I am missing on why it isn’t connected given how much time and attempt to redo it? is it just an eventually thing?

dh007@lemmy.world · 20 小时前

I’m getting errors for 1 and 8, all the rest went smooth.

acelee1012@lemmy.world · 20 小时前

i am not seeing any errors, has just been stuck on downloading status with nothing going through. I originally added everything around the same time and all the other ones went through fine. I figured it was bugged or something so removed then readded it several times to no avail. I am not sure what else to try

acelee1012@lemmy.world · 20 小时前

its really strange because on my other machine, everythings going fine

captainmycaptain@lemmy.world · edit-2 22 小时前

read the OP

acelee1012@lemmy.world · 22 小时前

regardless of OP removing the magnet links or not, the torrents are still out there and that shouldn’t stop it. secondly, I meant 11

lukky@lemmy.zip · 20 小时前

**what is the name of the softwar you use for torrent ? **

acelee1012@lemmy.world · 20 小时前

transmission

lukky@lemmy.zip · 20 小时前

thank you

PeoplesElbow@lemmy.world · 2 天前

Ok everyone, I have done a complete indexing of the first 13,000 pages of the DOJ Data Set 9.

KEY FINDING: 3 files are listed but INACCESSIBLE

These appear in DOJ pagination but return error pages - potential evidence of removal:

EFTA00326497

EFTA00326501

EFTA00534391

You can try them yourself (they all fail):

https://www.justice.gov/epstein/files/DataSet 9/EFTA00326497.pdf

The 86GB torrent is 7x more complete than DOJ website

DOJ website exposes: 77,766 files

Torrent contains: 531,256 files

Page Range Min EFTA Max EFTA New Files

0-499 EFTA00039025 EFTA00267311 21,842

500-999 EFTA00267314 EFTA00337032 18,983

1000-1499 EFTA00067524 EFTA00380774 14,396

1500-1999 EFTA00092963 EFTA00413050 2,709

2000-2499 EFTA00083599 EFTA00426736 4,432

2500-2999 EFTA00218527 EFTA00423620 4,515

3000-3499 EFTA00203975 EFTA00539216 2,692

3500-3999 EFTA00137295 EFTA00313715 329

4000-4499 EFTA00078217 EFTA00338754 706

4500-4999 EFTA00338134 EFTA00384534 2,825

5000-5499 EFTA00377742 EFTA00415182 1,353

5500-5999 EFTA00416356 EFTA00432673 1,214

6000-6499 EFTA00213187 EFTA00270156 501

6500-6999 EFTA00068280 EFTA00281003 554

7000-7499 EFTA00154989 EFTA00425720 106

7500-7999 (no new files - all wraps/redundant)

8000-8499 (no new files - all wraps/redundant)

8500-8999 EFTA00168409 EFTA00169291 10

9000-9499 EFTA00154873 EFTA00154974 35

9500-9999 EFTA00139661 EFTA00377759 324

10000-10499 EFTA00140897 EFTA01262781 240

10500-12999 (no new files - all wraps/redundant)

TOTAL UNIQUE FILES: 77,766

Pagination limit discovered: page 184,467,440,737,095,516 (2^64/100)

I searched random pages between 13k and this limit - NO new documents found. The pagination is an infinite loop. All work at: https://github.com/degenai/Dataset9

PeoplesElbow@lemmy.world · 23 小时前

DOJ Epstein Files: I found what’s around those 3 missing files (Part 2)

Follow-up to my Dataset 9 indexing post. I pulled the adjacent files from my local copy of the torrent. What I found is… notable.

TLDR

The 3 missing files aren’t random corruption. They all cluster around one event: Epstein’s girlfriend Karyna Shuliak leaving St. Thomas (the island) in April 2016. And one of the gaps sits directly next to an email where Epstein recommends her a novel about a sympathetic pedophile—two days before the book was publicly released.

The Big Finding: Duplicate Processing Batches

Two of the missing files (326497 and 534391) are the same document processed twice—once with redactions, once without—208,000 files apart in the index.

Redacted Batch	Unredacted Batch	Content
326494-326496	534388-534390	AmEx travel booking, staff emails
326497 - MISSING	534391 - MISSING	???
326498-326500	—	Email chain continues
326501 - MISSING	—	???
326502-326506	—	Reply + Invoice
—	534392	Epstein personal email

Random file corruption hitting the same logical document in two separate processing runs, 208,000 positions apart? That’s not how corruption works. That’s how removal works.

What’s Actually In These Files

I pulled everything around the gaps. It’s all one email chain from April 10, 2016:

The event: Karyna Shuliak (Epstein’s girlfriend) booked on Delta flight from Charlotte Amalie, St. Thomas → JFK on April 13, 2016.

St. Thomas is where you fly in/out to reach Little St. James. She was leaving the island.

The chain:

11:31 AM — AmEx Centurion (black card) sends confirmation to lesley.jee@gmail.com
11:33 AM — Lesley Groff (Epstein’s executive assistant) forwards to Shuliak, CC’s staff
11:35 AM — Shuliak replies “Thanks so much”
3:52 PM — Epstein personally emails Shuliak
Next day — AmEx sends invoice

The unredacted batch (534xxx) reveals the email addresses that are blacked out in the redacted batch (326xxx):

Lesley Groff: lesley.jee@gmail.com
Ann Rodriquez: annrodriquez@yahoo.com
Bella Klein: bklein575@gmail.com
Karyna Shuliak: karynashuliak@icloud.com

The Epstein Email (EFTA00534392)

The document immediately after missing file 534391:

From: "jeffrey E." <jeevacation@gmail.com>
To: Karyna Shuliak
Date: Sun, 10 Apr 2016 19:52:13 +0000

order http://softskull.com/dd-product/undone/

He’s telling her to buy a book. The same day she’s being booked to leave his island.

The Book

“Undone” by John Colapinto (Soft Skull Press)

On-sale date: April 12, 2016
Epstein’s email: April 10, 2016

He recommended it two days before public release.

Publisher’s description:

“Dez is a former lawyer and teacher—an ephebophile with a proclivity for teenage girls, hiding out in a trailer park with his latest conquest, Chloe. Having been in and out of courtrooms (and therapists’ offices) for a number of years, Dez is at odds with a society that persecutes him over his desires.”

The protagonist is a pedophile who resents society for judging him.

The author (John Colapinto) is a New Yorker staff writer, former Vanity Fair and Rolling Stone contributor. Exactly the media circles Epstein cultivated.

What’s Missing

So now we know the context:

EFTA00326497 — Between AmEx confirmation and Groff’s forward. Probably the PDF ticket attachment referenced in the emails.
EFTA00326501 — Between the forward chain and Shuliak’s reply. Unknown.
EFTA00534391 — Immediately before Epstein’s personal email about the pedo book. Unknown, but its position is notable.

Open Questions

How did Epstein have this book before release? Advance copy? Knows the author?
What is 534391? It sits between staff logistics emails and Epstein’s direct correspondence. Another Epstein email? An attachment?
Are there other Shuliak travel records with similar gaps? Is April 2016 unique or part of a pattern?
What else is in the corpus from jeevacation@gmail.com?

Verify It Yourself

Try the DOJ links (all return errors):

Check the torrent: Pull the EFTA numbers I listed. Confirm the gaps. Confirm the adjacencies.

Grep the corpus: Search for “QWURMO” (booking reference), “Shuliak”, “jeevacation”, “Colapinto”

Summary

Three files missing from 531,256. All three cluster around one girlfriend’s April 2016 departure from St. Thomas. Same gaps appear in two processing batches 208,000 files apart. One gap sits adjacent to Epstein personally recommending a novel about a sympathetic pedophile, sent before the book was even publicly available.

This isn’t random corruption.

Full analysis + all code: https://github.com/degenai/Dataset9

If anyone has the torrent and wants to grep for Colapinto connections or other Shuliak trips, please do. This is open source for a reason.

sherbeticecream@lemmy.world · 17 小时前

Just skimming through and I have file 534391 but it shows ‘No Images Produced’ not sure if that was your reason as well and apologies in advance! Heres an image of said file (https://lemmy.world/pictrs/image/d840f280-5e32-4417-a92e-ff281582080a.png)

PeoplesElbow@lemmy.world · 15 小时前

That is new information! I wasnt even able to get that ‘no images produced’ page, good to know thank you. I just hit a file corruption error when I tried to dl from the DOJ. Thank you for the information. I guess this means the content is still missing in a way but at least accounted for.

kongstrong@lemmy.world · 1 天前

ysk the page limit has been fixed, it caps out around 9600 for a total of ~197k file entries. Way less than the largest torrent’s 530k. Scraping now to get a list of the files they kept on the DOJ so we can determine which files they don’t want out there. Would be a good lead to further investigate the torrent

PeoplesElbow@lemmy.world · 23 小时前

Oh no…I didn’t know this, on one hand now i need to run another scan, but on the other it could reveal something, the torrent has 500k+ files so there is still a gap. I will run the scraper again and do a new analysis in the next day or two.

Wild_Cow_5769@lemmy.world · 2 天前

Just like I said… In NO way do I trust DOJ… Our only hope is if someone drops the full data set 9 somewhere.

PeoplesElbow@lemmy.world · 1 天前

My question is, why is the total download size so large and the range of displayed documents so little? Only 15% of the known documents are individually served on the site, and some arent seen until page 10,000

Moonsurfer_1@lemmy.world · 1 天前

It’s an effort to obscure for sure.

Wild_Cow_5769@lemmy.world · 1 天前

Yup… hopefully someone is able to get the full zip

Wild_Cow_5769@lemmy.world · 1 天前

That’s why you need the full zip…