So I work with a couple of people and we ssh into a Linux server for work. There are several SSDs, and one of the non-boot Samsung 870 SSDs crashed the other day in the morning. This led to an inability to SSH into the server for those whose home directory are in this SSD. I did `dmesg and saw the following errors. what can I do at this point? besides restoring to a back up (I didn't make any recently... oops)
[Thu Jun 27 09:50:12 2024] RTL8226 2.5Gbps PHY r8169-0-4600:00: attached PHY driver (mii_bus:phy_addr=r8169-0-4600:00, irq=MAC)
[Thu Jun 27 09:50:13 2024] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[Thu Jun 27 09:50:13 2024] ata3.00: irq_stat 0x40000001
[Thu Jun 27 09:50:13 2024] ata3.00: failed command: FLUSH CACHE EXT
[Thu Jun 27 09:50:13 2024] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2
res 51/04:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[Thu Jun 27 09:50:13 2024] ata3.00: status: { DRDY ERR }
[Thu Jun 27 09:50:13 2024] ata3.00: error: { ABRT }
[Thu Jun 27 09:50:13 2024] ata3.00: supports DRM functions and may not be fully accessible
[Thu Jun 27 09:50:13 2024] ata3.00: failed to enable AA (error_mask=0x1)
[Thu Jun 27 09:50:13 2024] ata3.00: supports DRM functions and may not be fully accessible
[Thu Jun 27 09:50:13 2024] ata3.00: failed to enable AA (error_mask=0x1)
[Thu Jun 27 09:50:13 2024] ata3.00: configured for UDMA/133 (device error ignored)
[Thu Jun 27 09:50:13 2024] ata3.00: device reported invalid CHS sector 0
[Thu Jun 27 09:50:13 2024] ata3: EH complete
[Thu Jun 27 09:50:13 2024] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[Thu Jun 27 09:50:13 2024] ata3.00: irq_stat 0x40000001
[Thu Jun 27 09:50:13 2024] ata3.00: failed command: FLUSH CACHE EXT
[Thu Jun 27 09:50:13 2024] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 10
res 51/04:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[Thu Jun 27 09:50:14 2024] ata3: EH complete
[Thu Jun 27 09:50:14 2024] ata3.00: Enabling discard_zeroes_data
[Thu Jun 27 09:50:16 2024] atlantic 0000:44:00.0 enp68s0: atlantic: link change old 0 new 1000
[Thu Jun 27 09:50:16 2024] IPv6: ADDRCONF(NETDEV_CHANGE): enp68s0: link becomes ready
[Thu Jun 27 09:50:23 2024] rfkill: input handler disabled
[Thu Jun 27 09:50:35 2024] Bluetooth: RFCOMM TTY layer initialized
[Thu Jun 27 09:50:35 2024] Bluetooth: RFCOMM socket layer initialized
[Thu Jun 27 09:50:35 2024] Bluetooth: RFCOMM ver 1.11
[Thu Jun 27 09:50:36 2024] rfkill: input handler enabled
[Thu Jun 27 09:50:39 2024] rfkill: input handler disabled
[Thu Jun 27 09:51:06 2024] EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
[Thu Jun 27 09:51:06 2024] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Thu Jun 27 09:51:11 2024] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Thu Jun 27 09:51:16 2024] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[Thu Jun 27 09:51:16 2024] ata3.00: irq_stat 0x40000001
[Thu Jun 27 09:51:16 2024] ata3.00: failed command: WRITE DMA
[Thu Jun 27 09:51:16 2024] ata3.00: cmd ca/00:08:00:00:00/00:00:00:00:00/e0 tag 9 dma 4096 out
res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
[Thu Jun 27 09:51:06 2024] EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
[Thu Jun 27 09:51:06 2024] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 0, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Sense not available.
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 80 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
...(repeats until sector 800)...
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#11 access beyond end of device
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#25 access beyond end of device
[Thu Jun 27 09:51:28 2024] JBD2: recovery failed
[Thu Jun 27 09:51:28 2024] EXT4-fs (sdb): error loading journal
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 0, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Sense not available.
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 80 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 80, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#21 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#21 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 90 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 656 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 82, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#22 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#22 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 b0 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 688 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 86, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#23 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 e0 00 00 00 10 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 736 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 92, lost async page write
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 93, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#26 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#26 CDB: Write(16) 8a 00 00 00 00 00 00 00 03 10 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 784 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 98, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#27 CDB: Write(16) 8a 00 00 00 00 00 00 00 03 20 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] sdb: detected capacity change from 7814037168 to 0
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 800 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#11 access beyond end of device
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#25 access beyond end of device
[Thu Jun 27 09:51:28 2024] JBD2: recovery failed
[Thu Jun 27 09:51:28 2024] EXT4-fs (sdb): error loading journal
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#27 CDB: Write(16) 8a 00 00 00 00 00 00 00 03 20 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] sdb: detected capacity change from 7814037168 to 0
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 800 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#11 access beyond end of device
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#25 access beyond end of device
......
[Thu Jun 27 09:52:19 2024] ata2.00: Enabling discard_zeroes_data
[Thu Jun 27 09:52:19 2024] ata7.00: Enabling discard_zeroes_data
[Thu Jun 27 09:56:11 2024] EXT4-fs (sda): error count since last fsck: 29674
[Thu Jun 27 09:56:11 2024] EXT4-fs (sda): initial error at time 1677717172: __ext4_get_inode_loc_noinmem:4410: inode 179610110: block 718277183
and then there was one of these every 10 minutes throughout the day
[Thu Jun 27 13:30:14 2024] sd 2:0:0:0: [sdb] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 13:30:14 2024] sd 2:0:0:0: [sdb] tag#6 CDB: ATA command pass through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00
As far as I’m aware, the things to avoid are:
- fsck
Things I’m not sure I should avoid (please let me know what proper safe procedure is if I want to recover data)
- fsck -n
- Smartctl
My options (which should I choose?)
- ddrescue with the right options to make it fast and read good sectors first. Can this damage the ssd further?
- Send to data recovery people
umount /dev/sdb
And then you would run something like:e2fsck -fp /dev/sdb
Per: linux.org/docs/man8/e2fsck.html And, have you determined that the volume is not full?fsck
. At best you'll be no worse off. At worst you'll completely destroy whatever data might still be accessible. Instead, useddrescue
to make an image copy of the drive. Then copy that again and try to fix that second copy. This is well documented