TL;DR: ~60% of media data was recovered by retrieving cached images from Cloudflare and scraping The Wayback Machine. Over the coming days and weeks, we will work on restoring this data.
Greetings everyone! I’m u/Southernwolf, the Moderator (technically Admin since it’s on Lemmy) mentioned in the previous post by u/Crashdoom. I wanted to provide an update on the data retrieval I was working on, and provide details on what it will take for us to get the recovered media data back online.
Initially using a script (created with the help of Qwen AI) to retrieve cached media data from Cloudflare, I had been able to recover ~33% of lost media. Which by itself is honestly not that bad, given the cache was already starting to decay away. It required using a VPN to hop to different places around the globe, but ultimately that is what allowed me to recover the amount of media I did from CF cache alone.
However, at the recommendation of @arcanicanis@were.social, I modified my script with Qwen to scrape the Wayback Machine for the rest of the missing images. This took a while, as I couldn’t do more than one request every 2 seconds without hitting their rate limit, but after some 5 hours this was complete. As a result, this is the final tally of the recovered media:
Recovery report generated:
- Total entries in CSV: 6697
- Images recovered: 4080
- Images missing: 2617
- Recovery rate: 60.92%
- Total size: 2953.08 MB (2.88 GB)
Honestly, this is a phenomenal result! Far greater than I ever expected could be recovered of the media data. It’s not perfect, but this is far, far greater than I could have hoped for, and I can be more than satisfied in rescuing that large an amount of the lost media.
Now, with the media we have recovered, the process will turn to actually getting the images plugged back into the instance. This won’t necessarily be a simple process, due to the nature of how Pict-rs (the media database that Lemmy relies on). One can’t easily insert images back into it, as it uses rather large hash trees to store everything… So we will have to investigate ways to work around this. There are some potential simple solutions (such as just making endpoints manually for the images and hoping it doesn’t break Pict-rs) or some rather complex ones (such as switching our media database over to an entirely different system such as Postgres).
Which solution turns out to work best will determine how long it will take to get the lost media back online. But you can expect a wait of likely several days at minimum, to possible a few weeks. Once we have an idea of what will work, another update will get posted to let our users know.
Thank you for your patience with us as we work to fix this issue!
I give to the Internet archive every year, I get a lot of use from them and they deserve it.
Things happen, the fact you were able to recover after a mistake is good.
If you didn’t already know Asonix (creator of pict-rs) is here on the fediverse, and a furry, I’m sure they can help with the pict-rs stuff.
Something else strange is going on. My profile picture is still there at https://pawb.social/pictrs/image/7c611a5a-efd6-4df3-99cb-5c484978c254.png , but the format link from pict-rs is broken. https://pawb.social/pictrs/image/7c611a5a-efd6-4df3-99cb-5c484978c254.png?format=webp , https://pawb.social/pictrs/image/7c611a5a-efd6-4df3-99cb-5c484978c254.png?format=webp&thumbnail=96 , Spot checking has similar results for other images. https://pawb.social/pictrs/image/bb4749ec-590a-41c2-8b26-9d26dae4ed00.png?format=webp&thumbnail=96 (the original exists)
Pict-rs does support admin direct upload with a specific name, but its been a while and I don’t remember the details.
We haven’t yet got the images reimported so existing images and profile pictures will still be broken. The base URL without the format link would have been cached most likely, but the format link would error as pict-rs isn’t aware of the file you’re trying to reference.
For now, reuploading things like profile pictures is my recommendation rather than waiting for us to get the images reimported.
We did discuss briefly with Asonix but they weren’t sure it’d be possible, so we’re exploring options using an external database for pict-rs but that seems relatively new.
tl;dr we have the files, but are still working on getting them reimported!
I’d also like to add a comment here that if you have a few spare bucks to throw at The Internet Archive/The Wayback Machine, please do so! They absolutely earned it for this alone.
This was really creative and amazing work! Good job everyone <3
Yes, they pulled off something incredibly smart and creative when they could’ve just done nothing and accepted the loss. That’s what I’d call going above and beyond, and it feels great to be on an instance run by people like this <3
Great work, really appreciate all the effort that has gone into this :3
Thank you all for your efforts 🫶
Awesome keep it up!






