0

So I work with a couple of people and we ssh into a Linux server for work. There are several SSDs, and one of the non-boot Samsung 870 SSDs crashed the other day in the morning. This led to an inability to SSH into the server for those whose home directory are in this SSD. I did `dmesg and saw the following errors. what can I do at this point? besides restoring to a back up (I didn't make any recently... oops)

[Thu Jun 27 09:50:12 2024] RTL8226 2.5Gbps PHY r8169-0-4600:00: attached PHY driver (mii_bus:phy_addr=r8169-0-4600:00, irq=MAC)
[Thu Jun 27 09:50:13 2024] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[Thu Jun 27 09:50:13 2024] ata3.00: irq_stat 0x40000001
[Thu Jun 27 09:50:13 2024] ata3.00: failed command: FLUSH CACHE EXT
[Thu Jun 27 09:50:13 2024] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2
                                    res 51/04:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[Thu Jun 27 09:50:13 2024] ata3.00: status: { DRDY ERR }
[Thu Jun 27 09:50:13 2024] ata3.00: error: { ABRT }
[Thu Jun 27 09:50:13 2024] ata3.00: supports DRM functions and may not be fully accessible
[Thu Jun 27 09:50:13 2024] ata3.00: failed to enable AA (error_mask=0x1)
[Thu Jun 27 09:50:13 2024] ata3.00: supports DRM functions and may not be fully accessible
[Thu Jun 27 09:50:13 2024] ata3.00: failed to enable AA (error_mask=0x1)
[Thu Jun 27 09:50:13 2024] ata3.00: configured for UDMA/133 (device error ignored)
[Thu Jun 27 09:50:13 2024] ata3.00: device reported invalid CHS sector 0
[Thu Jun 27 09:50:13 2024] ata3: EH complete
[Thu Jun 27 09:50:13 2024] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[Thu Jun 27 09:50:13 2024] ata3.00: irq_stat 0x40000001
[Thu Jun 27 09:50:13 2024] ata3.00: failed command: FLUSH CACHE EXT
[Thu Jun 27 09:50:13 2024] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 10
                                    res 51/04:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[Thu Jun 27 09:50:14 2024] ata3: EH complete
[Thu Jun 27 09:50:14 2024] ata3.00: Enabling discard_zeroes_data
[Thu Jun 27 09:50:16 2024] atlantic 0000:44:00.0 enp68s0: atlantic: link change old 0 new 1000
[Thu Jun 27 09:50:16 2024] IPv6: ADDRCONF(NETDEV_CHANGE): enp68s0: link becomes ready
[Thu Jun 27 09:50:23 2024] rfkill: input handler disabled
[Thu Jun 27 09:50:35 2024] Bluetooth: RFCOMM TTY layer initialized
[Thu Jun 27 09:50:35 2024] Bluetooth: RFCOMM socket layer initialized
[Thu Jun 27 09:50:35 2024] Bluetooth: RFCOMM ver 1.11
[Thu Jun 27 09:50:36 2024] rfkill: input handler enabled
[Thu Jun 27 09:50:39 2024] rfkill: input handler disabled
[Thu Jun 27 09:51:06 2024] EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
[Thu Jun 27 09:51:06 2024] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Thu Jun 27 09:51:11 2024] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Thu Jun 27 09:51:16 2024] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[Thu Jun 27 09:51:16 2024] ata3.00: irq_stat 0x40000001
[Thu Jun 27 09:51:16 2024] ata3.00: failed command: WRITE DMA
[Thu Jun 27 09:51:16 2024] ata3.00: cmd ca/00:08:00:00:00/00:00:00:00:00/e0 tag 9 dma 4096 out
                                    res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
[Thu Jun 27 09:51:06 2024] EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
[Thu Jun 27 09:51:06 2024] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 0, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Sense not available.
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 80 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
...(repeats until sector 800)...
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#11 access beyond end of device
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#25 access beyond end of device
[Thu Jun 27 09:51:28 2024] JBD2: recovery failed
[Thu Jun 27 09:51:28 2024] EXT4-fs (sdb): error loading journal


[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 0, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] Sense not available.
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#20 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 80 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 640 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 80, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#21 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#21 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 90 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 656 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 82, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#22 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#22 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 b0 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 688 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 86, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#23 CDB: Write(16) 8a 00 00 00 00 00 00 00 02 e0 00 00 00 10 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 736 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 92, lost async page write
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 93, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#26 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#26 CDB: Write(16) 8a 00 00 00 00 00 00 00 03 10 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 784 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] Buffer I/O error on dev sdb, logical block 98, lost async page write
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#27 CDB: Write(16) 8a 00 00 00 00 00 00 00 03 20 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] sdb: detected capacity change from 7814037168 to 0
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 800 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio cla
ss 0
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#11 access beyond end of device
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#25 access beyond end of device
[Thu Jun 27 09:51:28 2024] JBD2: recovery failed
[Thu Jun 27 09:51:28 2024] EXT4-fs (sdb): error loading journal
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#27 CDB: Write(16) 8a 00 00 00 00 00 00 00 03 20 00 00 00 08 00 00
[Thu Jun 27 09:51:28 2024] sdb: detected capacity change from 7814037168 to 0
[Thu Jun 27 09:51:28 2024] blk_update_request: I/O error, dev sdb, sector 800 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#11 access beyond end of device
[Thu Jun 27 09:51:28 2024] sd 2:0:0:0: [sdb] tag#25 access beyond end of device
......
[Thu Jun 27 09:52:19 2024] ata2.00: Enabling discard_zeroes_data
[Thu Jun 27 09:52:19 2024] ata7.00: Enabling discard_zeroes_data
[Thu Jun 27 09:56:11 2024] EXT4-fs (sda): error count since last fsck: 29674
[Thu Jun 27 09:56:11 2024] EXT4-fs (sda): initial error at time 1677717172: __ext4_get_inode_loc_noinmem:4410: inode 179610110: block 718277183

and then there was one of these every 10 minutes throughout the day

[Thu Jun 27 13:30:14 2024] sd 2:0:0:0: [sdb] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[Thu Jun 27 13:30:14 2024] sd 2:0:0:0: [sdb] tag#6 CDB: ATA command pass through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00

As far as I’m aware, the things to avoid are:

  • fsck

Things I’m not sure I should avoid (please let me know what proper safe procedure is if I want to recover data)

  • fsck -n
  • Smartctl

My options (which should I choose?)

  • ddrescue with the right options to make it fast and read good sectors first. Can this damage the ssd further?
  • Send to data recovery people
6
  • Are you able to access the volume and perform a filesystem check? To do this you would unmount the drive like: umount /dev/sdb And then you would run something like: e2fsck -fp /dev/sdb Per: linux.org/docs/man8/e2fsck.html And, have you determined that the volume is not full?
    – Csnap
    Commented Jul 1 at 20:49
  • For recovering the data, I would recommend something like TestDisk: cgsecurity.org/wiki/TestDisk_Step_By_Step
    – Csnap
    Commented Jul 1 at 20:52
  • 1
    it actually cannot be mounted. just throws an error. I read in some places that running e2fsck could damage the SSD further if the hardware is failing, is that true? Can i access a volume without mounting to check if it is full? Thanks for the pointer Csnap, I will check that out.
    – Derek Xiao
    Commented Jul 1 at 21:09
  • Whatever you do, DO NOT attempt to fix the filesystem with fsck. At best you'll be no worse off. At worst you'll completely destroy whatever data might still be accessible. Instead, use ddrescue to make an image copy of the drive. Then copy that again and try to fix that second copy. This is well documented Commented Jul 1 at 21:32
  • Thanks Chris! By the way, I was meaning to ask. could ddrescue make anything worse? I have been wondering if my choices are 1) send to data recovery experts and 2) run ddrescue first then send. If running ddrescue could make the job harder for data recovery experts - assuming there was something salvagable before i ran ddrescue. As for fsck, I have read about this. How about fsck with the -n option?
    – Derek Xiao
    Commented Jul 1 at 21:35

1 Answer 1

1

You have multiple disk issues.

Your sda disk has filesystem-level errors, and they might be fixable by unmounting the partitions and running a filesystem check on them (e2fsck only if the filesystem type is ext2/ext3/ext4, other filesystem types have other tools).

But your sdb disk seems either completely dead or very close to it. Note the log message:

sdb: detected capacity change from 7814037168 to 0

This is not something e2fsck or testdisk can fix. It means the disk firmware has recognized the disk is failing its internal self-tests so badly, it "thinks" there is no point in even trying to work as a disk any more. The only thing you can do with this disk is maybe try and clone whatever you can off it using ddrescue or similar rescue cloning utility, and even that might take many tries or might not work at all.

If the disk reports something resembling its normal capacity the next time it is powered on, ddrescue might still be able to get something out of it... but if tools like lsblk report the size of sdb as zero or the disk refuses to work at all, only data recovery experts with tools that can bypass the normal disk firmware functions could get anything out of it.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .