0

Setup

I refurbished an ASRock QC5000M motherboard for a home server, by installing Debian 12 on it with the intent to set up a RAID 1 on 2 hard drives.

This motherboard only has 2 SATA ports, so I added a chenyang SA-208-CY PCI express expansion card in order to have 2 more SATA ports. The 2 hard drives are 2 TB Western Digital Blue WDC WD20EZRZ-00Z that worked since 2017 and have 0 SMART error (tested with smartctl).

The setup:

  • 1 SSD plugged to port 1 of the motherboard
  • 1 HDD plugged to port 2 of the motherboard
  • 1 HDD plugged to the PCI express card

Problem

My issue is that after I enabled the RAID 1, the filesystem was broken after every reboot, I had to run fsck.ext4 -y /dev/md0 every time. At first there was a few errors about inodes, but later it got worst and it removed nearly all the files (thankfully I had backups).

I had to remove the hard drive connected from the RAID 1 array in order to have a working RAID 1 filesystem, with only one hard drive, which is not great.

So I investigated and found some issues reported by dmesg (see the logs below(1)):

  • ata4.00: failed command: WRITE FPDMA QUEUED
  • ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

I replaced the SATA cable with another one and the same errors appeared.

lspci shows that the expansion card is not in AHCI mode:

lspci -nn | grep -i sata
00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7801] (rev 40)
05:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)

Can this be the reason that explain that the RAID 1 can't work properly? Can this be solved by configuring something in Debian 12?

Answers to this question on ubuntu.SE mentions issues with SATA power plug or changing the PSU, I didn't tried since I'm far away from the computer right now.


Update

I just found something interesting by searching the error and the controller name ("WRITE FPDMA QUEUED" ASM1062):

I figured out that the issue appers only when SATA disk connected to the COM4 port of the ASM1062 board while if your try to connect to the other internal connector ( COM3 ) it doesn't report any issue at all.

Source: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1388559/comments/13

I'll try this ASAP.

Update 2

Errors WRITE FPDMA QUEUED and ncq still appear after using the other port, but on ata3.00.


(1) Output of dmesg:

[  107.152069] ata4.00: exception Emask 0x10 SAct 0x2 SErr 0x400000 action 0x6 frozen
[  107.152113] ata4.00: irq_stat 0x08000000, interface fatal error
[  107.152129] ata4: SError: { Handshk }
[  107.152148] ata4.00: failed command: WRITE FPDMA QUEUED
[  107.152162] ata4.00: cmd 61/01:08:08:08:00/00:00:00:00:00/40 tag 1 ncq dma 512 out
                        res 40/00:0c:08:08:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  107.152207] ata4.00: status: { DRDY }
[  107.152232] ata4: hard resetting link
[  107.627952] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  107.629311] ata4.00: configured for UDMA/133
[  107.629360] ata4: EH complete
[  107.696032] ata4.00: exception Emask 0x10 SAct 0x81000 SErr 0x400000 action 0x6 frozen
[  107.696076] ata4.00: irq_stat 0x08000000, interface fatal error
[  107.696092] ata4: SError: { Handshk }
[  107.696113] ata4.00: failed command: WRITE FPDMA QUEUED
[  107.696127] ata4.00: cmd 61/01:60:08:08:00/00:00:00:00:00/40 tag 12 ncq dma 512 out
                        res 40/00:9c:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  107.696173] ata4.00: status: { DRDY }
[  107.696189] ata4.00: failed command: READ FPDMA QUEUED
[  107.696201] ata4.00: cmd 60/08:98:00:00:00/00:00:00:00:00/40 tag 19 ncq dma 4096 in
                        res 40/00:9c:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  107.696250] ata4.00: status: { DRDY }
[  107.696273] ata4: hard resetting link
[  108.167983] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  108.169348] ata4.00: configured for UDMA/133
[  108.169417] sd 3:0:0:0: [sdc] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[  108.169435] sd 3:0:0:0: [sdc] tag#19 Sense Key : Illegal Request [current] 
[  108.169447] sd 3:0:0:0: [sdc] tag#19 Add. Sense: Unaligned write command
[  108.169460] sd 3:0:0:0: [sdc] tag#19 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[  108.169468] I/O error, dev sdc, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[  108.169535] ata4: EH complete
[  108.207911] md: recovery of RAID array md0
[  108.331923] ata4.00: exception Emask 0x10 SAct 0x400000 SErr 0x400000 action 0x6 frozen
[  108.331970] ata4.00: irq_stat 0x08000000, interface fatal error
[  108.331988] ata4: SError: { Handshk }
[  108.332012] ata4.00: failed command: WRITE FPDMA QUEUED
[  108.332027] ata4.00: cmd 61/00:b0:00:10:04/0a:00:00:00:00/40 tag 22 ncq dma 1310720 ou
                        res 40/00:b4:00:10:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  108.332079] ata4.00: status: { DRDY }
[  108.332101] ata4: hard resetting link
[  108.811925] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  108.813266] ata4.00: configured for UDMA/133
[  108.813317] ata4: EH complete
[  108.887995] ata4: limiting SATA link speed to 3.0 Gbps
[  108.888020] ata4.00: exception Emask 0x10 SAct 0x3c SErr 0x400000 action 0x6 frozen
[  108.888053] ata4.00: irq_stat 0x08000000, interface fatal error
[  108.888069] ata4: SError: { Handshk }
[  108.888090] ata4.00: failed command: WRITE FPDMA QUEUED
[  108.888105] ata4.00: cmd 61/01:10:08:08:00/00:00:00:00:00/40 tag 2 ncq dma 512 out
                        res 40/00:2c:00:1a:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  108.888156] ata4.00: status: { DRDY }
[  108.888172] ata4.00: failed command: READ FPDMA QUEUED
[  108.888186] ata4.00: cmd 60/08:18:08:08:00/00:00:00:00:00/40 tag 3 ncq dma 4096 in
                        res 40/00:2c:00:1a:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  108.888233] ata4.00: status: { DRDY }
[  108.888248] ata4.00: failed command: WRITE FPDMA QUEUED
[  108.888262] ata4.00: cmd 61/00:20:00:10:04/0a:00:00:00:00/40 tag 4 ncq dma 1310720 ou
                        res 40/00:2c:00:1a:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  108.888309] ata4.00: status: { DRDY }
[  108.888324] ata4.00: failed command: WRITE FPDMA QUEUED
[  108.888338] ata4.00: cmd 61/80:28:00:1a:04/00:00:00:00:00/40 tag 5 ncq dma 65536 out
                        res 40/00:2c:00:1a:04/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  108.888384] ata4.00: status: { DRDY }
[  108.888408] ata4: hard resetting link
[  109.364012] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  109.365269] ata4.00: configured for UDMA/133
[  109.365334] ata4: EH complete

Output of lspci:

$ lspci -nnk | grep --after-context=3 SATA
00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7801] (rev 40)
    Subsystem: ASRock Incorporation QC5000-ITX/PH [1849:7801]
    Kernel driver in use: ahci
    Kernel modules: ahci
--
05:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
    Subsystem: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:1060]
    Kernel driver in use: ahci
    Kernel modules: ahci

2
  • If the expansion card WAS in AHCI mode, that would explain the reason RAID mode was not working. The fact it is NOT in AHCI means RAID mode should be working.
    – Ramhound
    Commented Jan 6 at 23:18
  • 1
    @Ramhound: But OP is using software RAID, not RST RAID... Commented Jan 6 at 23:43

2 Answers 2

2

lspci shows that the expansion card is not in AHCI mode:

Can this be the reason that explain that the RAID 1 can't work properly?

It is in AHCI mode. Otherwise it literally wouldn't be a "SATA controller" – this device type is only used by devices that provide an AHCI interface to the OS. (SATA is the interface between disk and controller, AHCI is the interface between controller and OS.)

The "[AHCI mode]" label is specifically for Intel SATA controllers, and it's just part of the model name that's defined in lspci's device database. (That is, Intel RST controllers use two different PCI model IDs depending on which mode they're in.)

The purpose of such a label is because Intel RST can be either in "pure" (passthrough) controller mode or in their special "hardware RAID" mode. Your basic SATA controller doesn't have the latter, plain AHCI is all it can do in the first place.

Therefore, it cannot be the reason.

(I'm not sure of what the actual reason is, but from the dmesg logs it looks like the OS can talk to the controller, but the controller is having issues talking to the disk. I would guess one of "SATA cable bad", "SATA port bad", or "controller cheap and unreliable". The Ubuntu bug report makes me think it's the latter.)

1
  • You were right, I added the output of lspci to the question, I see Kernel driver in use: ahci for both SATA controllers so AHCI is probably not an issue. Commented Jan 9 at 22:11
0

I fixed the issue but putting the SSD with the OS in an USB enclosure, and plugging the 2 hard drives directly to the motherboard.

In other words, I removed the ASM1062 card since it had issues with the system.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .