Hey folks, I’m at my wits end. I’ve been screwing with proxmox for years now, but I’m at a tipping point. I’ve just used consumer SSDs in it to run my VMs off of - but I just realized after a dozen or so crashes over the last week that I think the SSDs are the culprit. (Really, really terrible write speeds leading to kernel crashes I believe).

I’ve never gotten an enterprise SSD, if that’s even what I need. Any recommendations? New? Used? Brands?

Appreciate it

  • @doeknius_gloek@feddit.de
    link
    fedilink
    English
    67 months ago

    I recently upgraded three of my proxmox hosts with SSDs to make use of ceph. While researching I faced the same question - everyone said you need an enterprise SSD, or ceph would eat it alive. The feature that apparently matters the most in my case is Power Loss Protection (PLP). It’s not even primarily needed to protect from an possible outage, but it forces sync writes instead of relying on a cache for performance.

    There are some SSDs marketed for usage in data centers, these are generally enterprisey. Often they are classified for “Mixed Use” (read and write) or “Read Intensive”. Other interesting metrics are the Drive Writes Per Day (DWPD) and obviously TBW and IOPS.

    At the end I went with used Samsung PM883.

    But before you fall into this rabbit hole, you might check if you really need an enterprise SSD. If all you’re doing is running a few vms in a homelab, I would expect consumer SSDs to work just fine.

    • ScrubblesOP
      link
      fedilink
      English
      27 months ago

      Well I have the exact same use case and I just checked and yup, 3 out of 4 drives failed in a year. Those were shitty WD blues though, so I think it’s time to shell out real money

      • qupada
        link
        fedilink
        47 months ago

        To expand on @doeknius_gloek’s comment, those categories usually directly correlate to a range of DWPD (endurance) figures. I’m most familiar with buying servers from Dell, but other brands are pretty similar.

        Usually, the split is something like this:

        • Read-intensive (RI): 0.8 - 1.2 DWPD (commonly used for file servers and the likes, where data is relatively static)
        • Mixed-use (MU): 3 - 5 DWPD (normal for databases or cache servers, where data is changing relatively frequently)
        • Write-intensive (WI): ≥10 DPWD (for massive databases, heavily-used write cache devices like ZFS ZIL/SLOG devices, that sort of thing)

        (Consumer SSDs frequently have endurances only in the 0.1 - 0.3 DWPD range for comparison, and I’ve seen as low as 0.05)

        You’ll also find these tiers roughly line up with the SSDs that expose different capacities while having the same amount of flash inside; where a consumer drive would be 512GB, an enterprise RI would be 480GB, and a MU/WI only 400GB. Similarly 1TB/960GB/800GB, 2TB/1.92TB/1.6TB, etc.

        If you only get a TBW figure, just divide by the capacity and the length of the warranty. For instance a 1.92TB 1DWPD with 5y warranty might list 3.5PBW.

        • ScrubblesOP
          link
          fedilink
          English
          37 months ago

          Got it. So I’m thinking my ZFS is what killed these poor drives, who didn’t sign up for that sort of life. I think short term I’ll run over to best buy and get a decent 1 or 2 TB drive to migrate things to just to keep it running (and not use ZFS). From what I’m reading on other forums - yeah ZFS was the killer here.

          Long term, maybe enterprise drives, or really deciding if my app server even needs a pool. I did that last time as a “I don’t want to run out of storage for a while” but I’m seeing 4TB drives now for a few hundred bucks. Not cheap, but much cheaper than the 2k they were just a few years ago. I don’t store anything on the app servers, just containers and vms.