

I found this in a random doc today. I’ll add it to your list and give it a shot tonight. It’ll be slow going so I don’t get rate limited again. I think if you hit too many 404’s in a row the CDN locks you out for a bit.


I found this in a random doc today. I’ll add it to your list and give it a shot tonight. It’ll be slow going so I don’t get rate limited again. I think if you hit too many 404’s in a row the CDN locks you out for a bit.

I could only grab ~44 of the NATIVEs you’ve listed and they total up to a tiny portion of the expected 80GB remaining. The hard part is guessing what file extension these files will have without getting rate limited by DOJ. I was hoping to get a copy of the zip file’s EOCD but it’s still down.
If anyone ever sees that zip come back please try and download the last 150-200MB. That’s where the zip archive’s table of contents is gonna live.

You rock. I didn’t realize NATIVEs had a placeholder PDF. I’ll try and scrape the media files tonight to add to the existing dataset 9 more complete archive.

What’s your method for getting the zip file without being cut off by the CDN?

Hi, OG 101GB dataset uploaded here. The DAT/OPT files are exactly what I used to fetch the files for this dataset.
I want to go through the other partial dataset 9 zips and check for deltas in the contents of the DAT/OPT files but haven’t had the time yet.
Can you also check and see if dataset 8/10/11 have all the native files they should based on the presence of these placeholders?