A few different options with how to host the archives.
Here’s what /r/datahoarder is doing with redarc
We could import it here, put it in a seperate community on this server, host it with redarc on a subdomain, it’s pretty much whatever.
I’ll put a survey up once I finish that server again for a vote, thought a discussion would be good to have prior to that going up.
Thoughts??
I like the idea of a separate community but at the same time there is definitely value in the continuity of transitioning the sub (I assume reddit will kill it sooner or later tbh).
I think I get an idea of the scope of the task. I agree it’s probably too intensive for most communities, but I’m sure others would be interested. What’s involved with getting the archive? Is it scraped, or something you can download as a mod? I can think of a community or two that might appreciate a new home away from their mods…
Agree they will at some point nuke that subreddit.
Pretty much everything has been archived that could have been, since the API goes dark in ~2 days.
There’s details of what apps were used, where to download the archives directly, links to torrents and such on a post from /r/DataHoarder which I’m collecting links/text/guides from over in our Gitea instance as well as importing projects used for this effort.
So far, there’s like ~5-6TB of archives I’ve downloaded through the links in that post, others on /r/DataHoarder, the-eye.eu, etc.
They go back allll the way to 2005… It’s just text though, no media. Unless the media was a link to Imgur or YouTube or something, then the links are in the posts.
There’s a couple bots/scripts that will repost new stuff moving forward from RSS feeds.
To inject directly from those backups into a Lemmy PostgreSQL database, I was using this tool, RedditLemmyImporter. Which, actually looks like it was made by the lemmy developer dessalines moving the r/GenZhou subreddit into lemmy.ml/lemmygrad.ml originally but forked very early to try to obscure that… so I do feel a bit dirty about that, and more so about Lemmy in general from some of the things I’ve seen from dessalines themselves.
TBH… I think it’s important to make a Fork of Lemmy itself, and to really really comb through this code base. Not sure if this is a long term solution if dessalines is still the head dev, one of the reasons I’m setting up other forums on this server as well. Lemmy and non-corp/federated social media is good, but I’m not liking the stewards of the lemmy code base the more I read their direct words/actions/history/and code like this importer.
Also, if there are communities out there that you’re interested in helping do this they have to either
or
I’m sure there are other ways to import it, like scripting something that literally re-posted everything through API calls to Lemmy or ugh clicking through the web GUI lmao. Without that database access I’d consider it too much work/hassle to be practical.