1

I am running a NAS with Openmediavault 5 and I have set up a RAID5 (with mdadm) with 3x 5TB WD Red harddisks back in the day (see details below for mdadm output). I am using the whole disk as member of the RAID. If I had to do it again I would have created a partition on each disk before adding them to the RAID, but that's not part of this discussion.

Unfortunately it seems like one of my HDDs (/dev/sdb) is starting to fail because the Current_Pending_Sector and Offline_Uncorrectable SMART values started increasing (see smartctl output below). Additionally the smartmontools selftests (short and long) fail because of read errors. I am going to replace the failing disk asap but before doing that I would like to find if any files on my RAID5 filesystem are affected.

I have used the badblocks command badblocks -b 4096 -o ~/sdb_badblocks.txt -sv /dev/sdb to find the sectors that can not be read sucessfully. The sectors on the drive /dev/sdb that were found to be unreadable (using 4096 bytes block size) are:

984123818
984124149
984124809
984125140
984125470
984125801

I have used dd to confirm that the sectors acuatlly cant be read. For that I used dd if=/dev/sdb bs=4096 count=1 skip=984123818 | hexdump -C which lead to the following output for all the sectors from above:

dd: error reading '/dev/sdb': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 9,71509 s, 0,0 kB/s

I have used mdadm --examine-badblocks /dev/sdb (for all three members of the array: sda, sdb and sdd) to check if mdadm found any bad blocks but the output is the same for all three HDDs:

Bad-blocks list is empty in /dev/sdb

Now I would like to know which files on my RAID ext4 filesystem are affected by those bad blocks. If that disk had an own filesystem I would know how to use tune2fs and debugfs to find the inode and the file (instructions from https://www.smartmontools.org/wiki/BadBlockHowto).

But how do I find the associated files on the RAID5 ext4 filesystem (if there are even any) to those bad blocks on the one member of the RAID? How can I convert the found sector number on /dev/sdb to the sector number in the /dev/md127 filesystem?

Thank you in advance for your help!


Here are some important outputs and information about my system:

smartctl -ia /dev/sdb

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.10.0-0.bpo.12-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD50EFRX-68L0BN1
Serial Number:    WD-WX31D88FVFEK
LU WWN Device Id: 5 0014ee 2bb16b3fc
Firmware Version: 82.00A82
User Capacity:    5.000.981.078.016 bytes [5,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Oct 12 20:22:11 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (52380) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 524) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   199   051    Pre-fail  Always       -       24
  3 Spin_Up_Time            0x0027   200   196   021    Pre-fail  Always       -       8983
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   057   057   000    Old_age   Always       -       31998
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       32
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       14
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       5704
194 Temperature_Celsius     0x0022   120   108   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       8
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       9

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     31990         3542399240
# 2  Short offline       Completed: read failure       90%     31966         3481656353
# 3  Extended offline    Completed: read failure       90%     31962         3481656358
# 4  Short offline       Completed without error       00%     31950         -
# 5  Extended offline    Completed: read failure       90%     31937         3481656358
# 6  Extended offline    Completed: read failure       20%     31928         3578023248
# 7  Extended offline    Completed: read failure       90%     31920         3542399240
# 8  Extended offline    Completed: read failure       90%     31920         3481656353
# 9  Short offline       Completed: read failure       90%     31918         3481656358
#10  Short offline       Completed: read failure       90%     31894         3481656354
#11  Short offline       Completed without error       00%     31870         -
#12  Short offline       Completed without error       00%     31846         -
#13  Short offline       Completed without error       00%     31822         -
#14  Short offline       Completed without error       00%     31798         -
#15  Short offline       Completed without error       00%     31774         -
#16  Short offline       Completed without error       00%     31750         -
#17  Short offline       Completed without error       00%     31726         -
#18  Short offline       Completed without error       00%     31702         -
#19  Short offline       Completed without error       00%     31678         -
#20  Short offline       Completed without error       00%     31654         -
#21  Short offline       Completed without error       00%     31630         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

uname -a

Linux HomeNAS 5.10.0-0.bpo.12-amd64 #1 SMP Debian 5.10.103-1~bpo10+1 (2022-03-08) x86_64 GNU/Linux

cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md127 : active raid5 sdd[5] sda[3] sdb[4]
      9767278592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

blkid

/dev/nvme0n1p1: UUID="649e3fe2-c351-4bd6-9e2f-38243f3f22d4" TYPE="ext4" PARTUUID="48c5342c-01"
/dev/nvme0n1p5: UUID="42901d13-899e-48dd-a53b-fe708a8017dd" TYPE="swap" PARTUUID="48c5342c-05"
/dev/sdd: UUID="bb8b3798-d160-71b4-cc60-bc8fdc8e0761" UUID_SUB="b2fa27d2-2d4d-cf14-cd80-cf18e5b0fcab" LABEL="HomeNAS:NAS" TYPE="linux_raid_member"
/dev/sdb: UUID="bb8b3798-d160-71b4-cc60-bc8fdc8e0761" UUID_SUB="bbc4815f-51af-de9f-2ced-e882c84fc3da" LABEL="HomeNAS:NAS" TYPE="linux_raid_member"
/dev/sdc1: UUID="8bb12818-d02e-40fe-b92e-f20f81377ae1" TYPE="ext4" PARTUUID="4e335cee-ed5e-4c04-8a57-d2b271481310"
/dev/md127: UUID="bd5ef96f-5587-4211-95c0-10219985ff6d" TYPE="ext4"
/dev/sda: UUID="bb8b3798-d160-71b4-cc60-bc8fdc8e0761" UUID_SUB="4624daa6-aa5d-b450-a4fe-3dd5f4e64e52" LABEL="HomeNAS:NAS" TYPE="linux_raid_member"
/dev/nvme0n1: PTUUID="48c5342c" PTTYPE="dos"

fdisk -l

Disk /dev/nvme0n1: 119,2 GiB, 128035676160 bytes, 250069680 sectors
Disk model: WDC PC SN520 SDAPMUW-128G
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x48c5342c

Device         Boot     Start       End   Sectors   Size Id Type
/dev/nvme0n1p1           2048 216754175 216752128 103,4G 83 Linux
/dev/nvme0n1p2      216756222 250068991  33312770  15,9G  5 Extended
/dev/nvme0n1p5      216756224 250068991  33312768  15,9G 82 Linux swap / Solaris


Disk /dev/sdd: 4,6 TiB, 5000981078016 bytes, 9767541168 sectors
Disk model: WDC WD50EFRX-68L
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdb: 4,6 TiB, 5000981078016 bytes, 9767541168 sectors
Disk model: WDC WD50EFRX-68L
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdc: 7,3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EFAX-68K
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 21E78FD8-C64D-4B35-B0DE-81AB66227A51

Device     Start         End     Sectors  Size Type
/dev/sdc1   2048 15628053134 15628051087  7,3T Linux filesystem


Disk /dev/sda: 4,6 TiB, 5000981078016 bytes, 9767541168 sectors
Disk model: WDC WD50EFRX-68L
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/md127: 9,1 TiB, 10001693278208 bytes, 19534557184 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes

mdadm --detail --scan --verbose

ARRAY /dev/md/NAS level=raid5 num-devices=3 metadata=1.2 name=HomeNAS:NAS UUID=bb8b3798:d16071b4:cc60bc8f:dc8e0761
   devices=/dev/sda,/dev/sdb,/dev/sdd

mdadm --detail /dev/md127

/dev/md127:
           Version : 1.2
     Creation Time : Sat Mar 12 17:22:49 2016
        Raid Level : raid5
        Array Size : 9767278592 (9314.80 GiB 10001.69 GB)
     Used Dev Size : 4883639296 (4657.40 GiB 5000.85 GB)
      Raid Devices : 3
     Total Devices : 3
       Persistence : Superblock is persistent

       Update Time : Wed Oct 12 01:02:18 2022
             State : clean
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : resync

              Name : HomeNAS:NAS  (local to host HomeNAS)
              UUID : bb8b3798:d16071b4:cc60bc8f:dc8e0761
            Events : 997

    Number   Major   Minor   RaidDevice State
       4       8       16        0      active sync   /dev/sdb
       5       8       48        1      active sync   /dev/sdd
       3       8        0        2      active sync   /dev/sda

1 Answer 1

1

Converting block numbers between the raid device and individual drives depends on the RAID 5 layout used. MDADM RAID 5 can use several different layouts but fortunately they are all very well defined and understood. In your case the array says it's using the left-symmetric layout. You can find RAID sector calculators on the internet (for example https://www.runtime.org/raid-calculator.htm) which will do the math for you given information like the RAID level (5), block size, layout and number of drives.

But you will also need the starting block number of the data area on each physical disk. That depends on the header/superblock/metadata version used by the array. Your array says it's using version 1.2 which means its superblock can be found 4k from the start of each disk. You will need to look at this superblock and use the information found there to calculate where the data actually starts on the drive (see https://raid.wiki.kernel.org/index.php/RAID_superblock_formats).

Now here is the part where I give you unsolicited advice! (I'm sure you knew this was coming when you asked the question.) As you've already mentioned you would create the array differently now if you could do it all over again. You really SHOULD do it over again if you care at all about this data and it sounds like you do care. Not only would it have been better to use partitions on the drives but it is a very, very bad idea to use RAID 5 on cheap 5 TB mechanical hard drives. The reasons for this are well-documented and easy to find with a search so I won't repeat them here. I'm surprised your array lasted this long. Just bite the bullet and build a new array. If you build an SSD array then RAID 5 will be fine but if you use mechanical drives you really want to use RAID 1, RAID 10 or RAID 6. (Personally, I wouldn't even use RAID 6 on today's big, cheap, mechanical drives. My recommendation for those is RAID 10. And whatever you do don't use SMR drives.)

5
  • Thank you very much for your answer! How do I find out the order of my HDDs within the array? Is it the "Number" or "RaidDevice" column in mdadm --detail? Whats the start sector that I need to enter in the RAID calculator in my case? You said the superblock is 4k from the start. Is the start sector then 4096 or 4097? Or ist it not that easy? What about the block size? The HDDs have a physical bock size of 4096 bytes but fdisk hows 512 bytes sector size. However the output of tune2fs -l /dev/md127 shows a block size of 4096. Which is the right input?
    – bash0r1988
    Commented Oct 16, 2022 at 18:20
  • Regarding your advice: I appreciate your comments. I am setting up a new RAID using new (CMR) 6TB HDDs (SSDs are too expensive with respect to the total size). I am going for RAID6 or RAID10.
    – bash0r1988
    Commented Oct 16, 2022 at 18:21
  • 1
    Order: I don't know. And /proc/mdstat shows a different (3rd) order too! Start sector: I don't know that for that specific drive in your particular array. You'd have to read the documentation I linked and analyze your header to figure that out. Block size: I believe you'd have to use the 512k chunk size reported by the array (but test it). I've built a lot of large MDADM arrays professionally and personally so I hope you won't mind my saying you shouldn't bother figuring this out. Your RAID array is working so just copy the data off. That's why you have RAID! Use RAID 10 for the new array. Commented Oct 16, 2022 at 22:30
  • 1
    I say use RAID 10 for the new array because you say you are able to consider both RAID 10 and RAID 6. If you can consider RAID 10 then you should definitely use RAID 10. The only reason to use RAID 6 is if you rule out RAID 10 because it won't fit your current or future storage needs within your chassis constraints (or budget). It's not just about the everyday performance it's about risk (and degraded performance) in a failure scenario. Commented Oct 16, 2022 at 22:40
  • 1
    I'll also add that in my opinion in pretty much any professional context these days SSD arrays are the way to go despite the large cost difference between the individual SSDs vs. mechanical drives. When all the costs, benefits and risks are weighed SSDs generally always come out on top. Commented Oct 16, 2022 at 23:00

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .