See linked posting. I’ve commented there with a link to a CLI tool in Python that allows downloading of IA collections. I’ve submitted a patch to enable specifying start and end points so that it’s easier to resume downloading a huge collection, or to allow multiple people to split up the work.

https://archive.org/details/georgeblood

https://archive.org/details/78rpm_bowling_green

F*ck the RIAA and absurdly long copyright.


EDIT: There is more than one collection of 78s on IA, so I updated the title.


The issue with these collections are that they’re absolutely HUGE. And yes, IA offers torrents for them, but as a separate torrent for every. single. album. And the torrents have all data in them – FLAC, fixed-rate MP3, VBR MP3, PDF liner notes, etc. etc… there may be some extremely hardcore data-hoarders out there who want everything, but IMHO as these are scratchy old 78 records, FLAC is overkill to just save the audio in a listenable format. The George Blood collection, just the VBR MP3s, is looking to be about 6TB. With ALL data it might be over 40TB! I can’t afford that many hard drives :)


So, my approach at the moment is to save just the VBR MP3s (they seem to be done at up to 320kbps VBR) and the JPEG album cover. If I have a chance and any storage left afterwards, I can make a separate pass to get the album liner PDFs…


Tool used: https://github.com/jjjake/internetarchive


Patch to allow setting start and end item indices for downloads: https://github.com/jjjake/internetarchive/pull/605


Example usage to grab just the VBR MP3 and record label JPG for each (note the --start-idx and --end-idx arguments):

#ia download --start-idx=4001 --end-idx=8000 -a -i --format="VBR MP3" --format="JPEG" --search collection:georgeblood

I’m going to concentrate on the George Blood collection for now… I’m starting at item 1. It would be great if others started at index 50,000, 100,000, 150,000, … and others started at the end and worked backwards in similarly-sized chunks, so that it’s assured someone gets each of them.

    • Arghblarg@lemmy.caOP
      link
      fedilink
      English
      arrow-up
      30
      ·
      edit-2
      1 year ago

      Yeah, you’re right, Fuck em.

      FYI I’m currently on 4001-8000 of the ‘Great 78 Collection’. Looks like I’ll need about 6TB to get it all, yikes! (Just the VBR MP3 files, not the FLACs. Holy Hell.)

      collection:georgeblood

      https://archive.org/details/georgeblood

      If everyone would take blocks of it, say 4000 each, we can eventually create torrents for each one or something so it can all be reassembled if/when the IA has to take it down.

        • Arghblarg@lemmy.caOP
          link
          fedilink
          English
          arrow-up
          5
          ·
          1 year ago

          I wish the IA would offer a torrents of the overall collection but it’s over 400k separate torrents, one for each album. And they contain FLACs, fixed- and VBR MP3s, PDF jacket notes, JPGs … it’s just too much for one person (I am OK with buying an 8TB drive or two, but not a dozen!)

          I’m trying to at least grab the VBR MP3s (these are old scratchy records after all… I don’t know how much FLAC will really preserve). Maybe if I can get most of those, I’ll do a second pass and get the album cover JPGs, then liner PDFs… depending on if/how long the collection stays up.

        • Arghblarg@lemmy.caOP
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 year ago

          around 5500… gonna take a while. My ISP says there’s no monthly cap but I wonder if I really should dl this much…

        • Arghblarg@lemmy.caOP
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 year ago

          Normally I would just fetch the torrent, yes, but this particular collection is huge – over 400k separate items (which on IA be their own torrents). Is there a way to get an aggregate, but filtered, torrent with just, say, the album jpg and VBR mp3 files for each? I don’t think I can afford the entire collection as each also has the FLACs.