In my dmesg logs I get following errors a lot:

[232671.710741] BTRFS warning (device nvme0n1p2): csum failed root 257 ino 2496314 off 946159616 csum 0xb7eb9798 expected csum 0x3803f9f6 mirror 1
[232671.710746] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 19297, gen 0
[232673.984324] BTRFS warning (device nvme0n1p2): csum failed root 257 ino 2496314 off 946159616 csum 0xb7eb9798 expected csum 0x3803f9f6 mirror 1
[232673.984329] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 19298, gen 0
[232673.988851] BTRFS warning (device nvme0n1p2): csum failed root 257 ino 2496314 off 946159616 csum 0xb7eb9798 expected csum 0x3803f9f6 mirror 1

I’ve run btrfs scrub start -Bd /home as described here. The report afterwards claim everything is fine.

btrfs scrub status /home
UUID:             145c0d63-05f8-43a2-934b-7583cb5f6100
Scrub started:    Fri Aug  4 11:35:19 2023
Status:           finished
Duration:         0:07:49
Total to scrub:   480.21GiB
Rate:             1.02GiB/s
Error summary:    no errors found
  • 𝒍𝒆𝒎𝒂𝒏𝒏
    link
    311 months ago

    Could you show us the raid type and info with btrfs fi us /home, and then the errors encountered on each disk with btrfs dev sta /home?

    • Agility0971OP
      link
      fedilink
      English
      411 months ago
      root@archiso /mnt/arch # btrfs fi us .
      Overall:
          Device size:		 931.01GiB
          Device allocated:		 526.02GiB
          Device unallocated:		 404.99GiB
          Device missing:		     0.00B
          Device slack:		     0.00B
          Used:			 480.21GiB
          Free (estimated):		 447.51GiB	(min: 245.02GiB)
          Free (statfs, df):		 447.51GiB
          Data ratio:			      1.00
          Metadata ratio:		      2.00
          Global reserve:		 512.00MiB	(used: 0.00B)
          Multiple profiles:		        no
      
      Data,single: Size:520.01GiB, Used:477.49GiB (91.82%)
         /dev/nvme0n1p2	 520.01GiB
      
      Metadata,DUP: Size:3.00GiB, Used:1.36GiB (45.45%)
         /dev/nvme0n1p2	   6.00GiB
      
      System,DUP: Size:8.00MiB, Used:80.00KiB (0.98%)
         /dev/nvme0n1p2	  16.00MiB
      
      Unallocated:
         /dev/nvme0n1p2	 404.99GiB
      
      root@archiso /mnt/arch # btrfs device stats .
      [/dev/nvme0n1p2].write_io_errs    0
      [/dev/nvme0n1p2].read_io_errs     0
      [/dev/nvme0n1p2].flush_io_errs    0
      [/dev/nvme0n1p2].corruption_errs  19317
      [/dev/nvme0n1p2].generation_errs  0
      
      • 𝒍𝒆𝒎𝒂𝒏𝒏
        link
        411 months ago

        Few possibilities here:

        Could be something wrong with the SSD - is it a Samsung one by any chance? There was a firmware issue that caused the SSD lifespan to degrade at a higher rate than normal… This article only covers the 980 but I believe there were a few models affected

        https://www.tomshardware.com/news/samsung-980-pro-ssd-failures-firmware-update

        It also could be that whatever files were corrupted have been deleted (maybe browser cache files etc.) or the allocated block is corrupted but contains no files within it. After running a scrub, the names of files within a corrupted block are shown in dmesg - if there’s none then I think you’re fine, but strongly consider replacing the SSD/updating its firmware/checking its SMART diagnostic data to see if its ok.

        The error counter can be reset with btrfs dev sta --reset to see if these errors pop up again after trying a resolution

        • Agility0971OP
          link
          fedilink
          English
          311 months ago

          It’s a KINGSTON SA2000M81000G. Here is a “datasheet”.

          I’ve looked up some of the inode numbers in the logs and they point to some application state data in /var so reinstalling application could bring those files back.

          I’ve never touched SMART before since I’ve assumed it’s an HDD thing. Anyway. I’ve installed smartmontools. nvme ssds don’t report smart stats like for hdds so this answer suggested looking for Percentage used in stead.

          root@archiso ~ # smartctl -a --test=long /dev/nvme0n1 | grep "Used"
          Percentage Used:                    2%
          

          It could be true that the firmware is not optimal but I could not find any news about that like you have for the 980. gnome software should keep firmware up to date in the background but just for good measure I ran it in live environment as well. I will probably get a new ssd at some point in the future and maybe use this old one for non critical storage in the future.