HDD randomly unmounting

dontblink@feddit.it · edit-2 14 days ago

HDD randomly unmounting

seaQueue@lemmy.world · edit-2 14 days ago

Don’t just look at sdb hits in the log. Open up that entire session in journalctl kernel mode (journalctl -k -bN where N is the session number in session history) and find the context surrounding the drive dropping and reconnecting.

You’ll probably find that something caused a USB bus reset or a similar event before the drive dropped and reconnected. if you find nothing like that try switching power supplies for the HDD and/or switching USB ports until you can move the drive to a different USB root port. Use lsusb -t and swap ports until the drive is attached beneath a different root port. You might have a neighboring USB device attached to the bus that’s causing issues for other devices attached to the same root port (it happens, USB devices or drivers sometimes behave badly.)

Always look at the context of the event when you’re troubleshooting a failure like this, don’t just drill down on the device messages. Most of the time the real cause of the issue preceded the symptom by a bit of time.

dontblink@feddit.it · edit-2 5 days ago

Thank you so much for taking the time to answer!

I’m not sure how to get the N from session history, nor how to check my session history…

but this might be some relevant output I’ve found with journalctl -k -b

Nov 21 16:08:18 rpi kernel: usb 2-2.1-port2: cannot reset (err = -110)
Nov 21 16:08:19 rpi kernel: usb 2-2.1-port2: cannot reset (err = -110)
Nov 21 16:08:19 rpi kernel: usb 2-2.1-port2: Cannot enable. Maybe the USB cable is bad?

Nov 21 16:41:57 rpi kernel: I/O error, dev sdb, sector 2466347032 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 2
Nov 21 16:41:57 rpi kernel: EXT4-fs warning (device sdb1): ext4_dx_find_entry:1796: inode #75497968: lblock 42: comm apache2: error -5 reading directory block
Nov 21 16:41:57 rpi kernel: EXT4-fs error (device sdb1): ext4_journal_check_start:83: comm apache2: Detected aborted journal
Nov 21 16:41:57 rpi kernel: Buffer I/O error on dev sdb1, logical block 0, lost sync page write
Nov 21 16:41:57 rpi kernel: EXT4-fs (sdb1): I/O error while writing superblock
Nov 21 16:41:57 rpi kernel: EXT4-fs (sdb1): Remounting filesystem read-only

The output is from yesterday, when the device stopped working correctly.

I’m not familiar with linux kernel, but I can see there is definitely something wrong…

The HDD (old) is attached to a USB hub (new), I tried switching port of the hub but the same issue happened again, if I try to mount it with sudo mount /mnt/2tb, it says it is already mounted:

mount: /mnt/2tb: /dev/sdb1 already mounted on /mnt/2tb.
       dmesg(1) may have more information after failed mount system call.

sudo dmesg | grep sdb gives back:

[147776.801028] I/O error, dev sdb, sector 77904 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 2
[147776.815452] EXT4-fs warning (device sdb1): htree_dirblock_to_tree:1083: inode #2: lblock 0: comm ls: error -5 reading directory block
[147796.731734] sdb1: Can't mount, would change RO state

seaQueue@lemmy.world · edit-2 5 hours ago

I’m not sure how to get the N from session history, nor how to check my session history…

journalctl --list-boots will list all sessions stored in the journal.

The output is from yesterday, when the device stopped working correctly.

I’m not familiar with linux kernel, but I can see there is definitely something wrong…

The HDD (old) is attached to a USB hub (new), I tried switching port of the hub but the same issue happened again, if I try to mount it with sudo mount /mnt/2tb, it says it is already mounted:

Those messages tell you what’s happening, there’s an unrecoverable error on the USB bus connecting the hard drive which is causing filesystem errors when writes fail. Diagnose that, lose the hub first and directly connect the drive to the pi, then try replacing the cable that attaches the drive if the error still occurs. I’d also check with people in the rpi community in case there are any known issues with USB on your model. There may be some pi specific USB firmware things you can do to increase reliability.

You can also try disabling UASP for the drive in case BOT transfer somehow stabilizes the connection. You’ll lose performance but that helps with some USB storage bridges.

Some USB storage bridges are just unreliable under Linux and crash under load, your last option is to buy another drive enclosure that’s tested and known to work correctly. I went through like 5 USB/NVMe enclosures looking for one that worked properly, that whole space is a compatibility mess.

hendrik@palaver.p3x.de · 14 days ago

Very good answer. I’ve also spent some time analyzing some red herrings when it was something else like a bad cable or connector. And by the way, you can use the same keys in journalctl as in the usual pager (less(?)) so hit / and search for ‘unmount’, ‘disconnect’, etc. And then scroll through the log and find out what led to the situation.