Setup
I refurbished an ASRock QC5000M motherboard for a home server, by installing Debian 12 on it with the intent to set up a RAID 1 on 2 hard drives.
This motherboard only has 2 SATA ports, so I added a chenyang SA-208-CY PCI express expansion card in order to have 2 more SATA ports. The 2 hard drives are 2 TB Western Digital Blue WDC WD20EZRZ-00Z that worked since 2017 and have 0 SMART error (tested with smartctl
).
The setup:
- 1 SSD plugged to port 1 of the motherboard
- 1 HDD plugged to port 2 of the motherboard
- 1 HDD plugged to the PCI express card
Problem
My issue is that after I enabled the RAID 1, the filesystem was broken after every reboot, I had to run fsck.ext4 -y /dev/md0
every time. At first there was a few errors about inodes, but later it got worst and it removed nearly all the files (thankfully I had backups).
I had to remove the hard drive connected from the RAID 1 array in order to have a working RAID 1 filesystem, with only one hard drive, which is not great.
So I investigated and found some issues reported by dmesg
(see the logs below(1)):
ata4.00: failed command: WRITE FPDMA QUEUED
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
I replaced the SATA cable with another one and the same errors appeared.
lspci
shows that the expansion card is not in AHCI mode:
lspci -nn | grep -i sata
00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7801] (rev 40)
05:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
Can this be the reason that explain that the RAID 1 can't work properly? Can this be solved by configuring something in Debian 12?
Answers to this question on ubuntu.SE mentions issues with SATA power plug or changing the PSU, I didn't tried since I'm far away from the computer right now.
Update
I just found something interesting by searching the error and the controller name ("WRITE FPDMA QUEUED" ASM1062):
I figured out that the issue appers only when SATA disk connected to the COM4 port of the ASM1062 board while if your try to connect to the other internal connector ( COM3 ) it doesn't report any issue at all.
Source: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1388559/comments/13
I'll try this ASAP.
Update 2
Errors WRITE FPDMA QUEUED
and ncq
still appear after using the other port, but on ata3.00
.
(1) Output of dmesg
:
[ 107.152069] ata4.00: exception Emask 0x10 SAct 0x2 SErr 0x400000 action 0x6 frozen
[ 107.152113] ata4.00: irq_stat 0x08000000, interface fatal error
[ 107.152129] ata4: SError: { Handshk }
[ 107.152148] ata4.00: failed command: WRITE FPDMA QUEUED
[ 107.152162] ata4.00: cmd 61/01:08:08:08:00/00:00:00:00:00/40 tag 1 ncq dma 512 out
res 40/00:0c:08:08:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 107.152207] ata4.00: status: { DRDY }
[ 107.152232] ata4: hard resetting link
[ 107.627952] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 107.629311] ata4.00: configured for UDMA/133
[ 107.629360] ata4: EH complete
[ 107.696032] ata4.00: exception Emask 0x10 SAct 0x81000 SErr 0x400000 action 0x6 frozen
[ 107.696076] ata4.00: irq_stat 0x08000000, interface fatal error
[ 107.696092] ata4: SError: { Handshk }
[ 107.696113] ata4.00: failed command: WRITE FPDMA QUEUED
[ 107.696127] ata4.00: cmd 61/01:60:08:08:00/00:00:00:00:00/40 tag 12 ncq dma 512 out
res 40/00:9c:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 107.696173] ata4.00: status: { DRDY }
[ 107.696189] ata4.00: failed command: READ FPDMA QUEUED
[ 107.696201] ata4.00: cmd 60/08:98:00:00:00/00:00:00:00:00/40 tag 19 ncq dma 4096 in
res 40/00:9c:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 107.696250] ata4.00: status: { DRDY }
[ 107.696273] ata4: hard resetting link
[ 108.167983] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 108.169348] ata4.00: configured for UDMA/133
[ 108.169417] sd 3:0:0:0: [sdc] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 108.169435] sd 3:0:0:0: [sdc] tag#19 Sense Key : Illegal Request [current]
[ 108.169447] sd 3:0:0:0: [sdc] tag#19 Add. Sense: Unaligned write command
[ 108.169460] sd 3:0:0:0: [sdc] tag#19 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[ 108.169468] I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[ 108.169535] ata4: EH complete
[ 108.207911] md: recovery of RAID array md0
[ 108.331923] ata4.00: exception Emask 0x10 SAct 0x400000 SErr 0x400000 action 0x6 frozen
[ 108.331970] ata4.00: irq_stat 0x08000000, interface fatal error
[ 108.331988] ata4: SError: { Handshk }
[ 108.332012] ata4.00: failed command: WRITE FPDMA QUEUED
[ 108.332027] ata4.00: cmd 61/00:b0:00:10:04/0a:00:00:00:00/40 tag 22 ncq dma 1310720 ou
res 40/00:b4:00:10:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 108.332079] ata4.00: status: { DRDY }
[ 108.332101] ata4: hard resetting link
[ 108.811925] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 108.813266] ata4.00: configured for UDMA/133
[ 108.813317] ata4: EH complete
[ 108.887995] ata4: limiting SATA link speed to 3.0 Gbps
[ 108.888020] ata4.00: exception Emask 0x10 SAct 0x3c SErr 0x400000 action 0x6 frozen
[ 108.888053] ata4.00: irq_stat 0x08000000, interface fatal error
[ 108.888069] ata4: SError: { Handshk }
[ 108.888090] ata4.00: failed command: WRITE FPDMA QUEUED
[ 108.888105] ata4.00: cmd 61/01:10:08:08:00/00:00:00:00:00/40 tag 2 ncq dma 512 out
res 40/00:2c:00:1a:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 108.888156] ata4.00: status: { DRDY }
[ 108.888172] ata4.00: failed command: READ FPDMA QUEUED
[ 108.888186] ata4.00: cmd 60/08:18:08:08:00/00:00:00:00:00/40 tag 3 ncq dma 4096 in
res 40/00:2c:00:1a:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 108.888233] ata4.00: status: { DRDY }
[ 108.888248] ata4.00: failed command: WRITE FPDMA QUEUED
[ 108.888262] ata4.00: cmd 61/00:20:00:10:04/0a:00:00:00:00/40 tag 4 ncq dma 1310720 ou
res 40/00:2c:00:1a:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 108.888309] ata4.00: status: { DRDY }
[ 108.888324] ata4.00: failed command: WRITE FPDMA QUEUED
[ 108.888338] ata4.00: cmd 61/80:28:00:1a:04/00:00:00:00:00/40 tag 5 ncq dma 65536 out
res 40/00:2c:00:1a:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 108.888384] ata4.00: status: { DRDY }
[ 108.888408] ata4: hard resetting link
[ 109.364012] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 109.365269] ata4.00: configured for UDMA/133
[ 109.365334] ata4: EH complete
Output of lspci
:
$ lspci -nnk | grep --after-context=3 SATA
00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7801] (rev 40)
Subsystem: ASRock Incorporation QC5000-ITX/PH [1849:7801]
Kernel driver in use: ahci
Kernel modules: ahci
--
05:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
Subsystem: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:1060]
Kernel driver in use: ahci
Kernel modules: ahci