2

I am observing strange behavior of smartctl with two WD Red 3TiB drives inserted in a WD My Book Duo drive enclosure and connected to the computer over USB. Namely, running a test on one of the drives also starts a test on the other drive:

$ blkid /dev/sda /dev/sdb
/dev/sda: UUID="7eca647d-ef1b-c354-3ab2-9c9a364a7303" UUID_SUB="5fca7ab9-2343-ca70-5d25-84739858c883" LABEL="wd:0" TYPE="linux_raid_member"
/dev/sdb: UUID="7eca647d-ef1b-c354-3ab2-9c9a364a7303" UUID_SUB="ef3895d7-ff13-0a89-91b9-a3f01a6744dc" LABEL="wd:0" TYPE="linux_raid_member"

# smartctl -d sat -t short /dev/sda
smartctl 6.6 2016-05-31 r4324 [armv6l-linux-4.9.80+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Sun May  6 16:12:07 2018

Use smartctl -X to abort test.

# smartctl -d sat -a /dev/sda      
smartctl 6.6 2016-05-31 r4324 [armv6l-linux-4.9.80+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68N32N0
Serial Number:    WD-WCC7K0HLK0TR
LU WWN Device Id: 5 0014ee 2b9c88d08
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun May  6 16:10:16 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

... snip

Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
... snip

# smartctl -d sat -a /dev/sdb
smartctl 6.6 2016-05-31 r4324 [armv6l-linux-4.9.80+] (local build)

... snip

Self-test execution status:      ( 249) Self-test routine in progress...
                                        70% of test remaining.
... snip

This leads me to believe that the drive enclosure intercepts the SMART commands and reports aggregate results for both drives simultaneously. Is there any known way to circumvent this other than getting rid of the drive enclosure?

EDIT: Adding the content of /etc/fstab, /etc/mdadm/mdadm.conf, and the output of blkid(8) as requested in the comments:

$ cat /etc/fstab 
proc                  /proc  proc    defaults                       0       0
PARTUUID=5cb553c4-01  /boot  vfat    defaults                       0       2
PARTUUID=5cb553c4-02  /      ext4    defaults,noatime               0       1
/dev/md0              /mnt   btrfs   relatime,compress,autodefrag   0       0

$ cat /etc/mdadm/mdadm.conf 
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# This configuration was auto-generated on Sat, 21 Apr 2018 13:55:00 +0200 by mkconf
ARRAY /dev/md0 metadata=1.2 name=inspiron:0 UUID=7eca647d:ef1bc354:3ab29c9a:364a7303

$ blkid
/dev/mmcblk0p1: LABEL="boot" UUID="5DB0-971B" TYPE="vfat" PARTUUID="5cb553c4-01"             
/dev/mmcblk0p2: LABEL="rootfs" UUID="060b57a8-62bd-4d48-a471-0d28466d1fbb" TYPE="ext4" PARTUUID="5cb553c4-02"                                                                             
/dev/sda: UUID="7eca647d-ef1b-c354-3ab2-9c9a364a7303" UUID_SUB="5fca7ab9-2343-ca70-5d25-84739858c883" LABEL="inspiron:0" TYPE="linux_raid_member"                                         
/dev/sdb: UUID="7eca647d-ef1b-c354-3ab2-9c9a364a7303" UUID_SUB="ef3895d7-ff13-0a89-91b9-a3f01a6744dc" LABEL="inspiron:0" TYPE="linux_raid_member"

EDIT2: Adding the content of /proc/devices as requested in the comments of @harrymc's answer:

$ cat /proc/devices
Character devices:
  1 mem
  4 /dev/vc/0
  4 tty
  5 /dev/tty
  5 /dev/console
  5 /dev/ptmx
  5 ttyprintk
  7 vcs
 10 misc
 13 input
 14 sound
 21 sg
 29 fb
116 alsa
128 ptm
136 pts
162 raw
180 usb
189 usb_device
204 ttyAMA
244 bcm2835-gpiomem
245 uio
246 vcsm
247 vchiq
248 hidraw
249 vcio
250 vc-mem
251 bsg
252 watchdog
253 rtc
254 gpiochip

Block devices:
  1 ramdisk
259 blkext
  7 loop
  8 sd
  9 md
 65 sd
 66 sd
 67 sd
 68 sd
 69 sd
 70 sd
 71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
179 mmc
253 device-mapper
254 mdp
5
  • Could you include your /etc/fstab? Are the disks in a RAID? I wonder if the problem arises because both disks have the same UUID, differing only by UUID_SUB, thus confusing smartctl. I know that one can use tune2fs to assign a new UUID, but I don't know if that is advisable here.
    – harrymc
    Commented May 11, 2018 at 20:09
  • I included /etc/fstab as requested as well as /etc/mdadm/mdadm.conf. The disks are in software RAID 1, available as /dev/md0. The /dev/sda, and /dev/sdb devices should correspond to the two physical WD Red drives.
    – Witiko
    Commented May 11, 2018 at 21:34
  • I think you are onto something with your UUID, and UUID_SUB remarks, see the output of blkid that I added. Could this be the issue?
    – Witiko
    Commented May 11, 2018 at 21:38
  • The RAID creates in effect one virtual disk. Try the syntax of -d sat,1 for slot 1. I'm a bit unclear whether slot numbers start from 0 or 1.
    – harrymc
    Commented May 12, 2018 at 6:07
  • I can confirm trouble in enclosures with USB-to-SATA bridges concerning SMART commands: The ATA passthrough command sometimes just doesn't work as expected. The obvious way to find out if it is a RAID artificact, or a USB-to-SATA bridge problem is to temporarily disable the READ and see if submitting SMART commands then works. The sg3-utils package may be helpful to test the ATA passthrough command.
    – dirkt
    Commented May 14, 2018 at 11:52

2 Answers 2

3
+250

Maybe I have found the explanation. It's not optimistic for you. Originally I was going to counter the -d sat,0 approach (because I think it cannot solve the problem) by writing few long comments to harrymc's answer. After I studied the source code of smartmontools I decided to make my conclusions a separate answer.

I downloaded smartmontools-6.7-0-20180419-r4731.src.tar.gz. I'm not good in reading code, still what I read (mostly in scsiata.cpp) indicates that -d sat,N, where N is a number, takes effect only for N being 12 or 16. For other values the effective value is 16 by default. N chooses just a variant of SCSI commands: 12-byte or 16-byte. This makes these tries with -d sat,0 futile.

The code corresponds with man 8 smartctl which says:

-d TYPE, --device=TYPE

sat[,auto][,N] - the device type is SCSI to ATA Translation (SAT). This is for ATA disks that have a SCSI to ATA Translation Layer (SATL) between the disk and the operating system. SAT defines two ATA PASS THROUGH SCSI commands, one 12 bytes long and the other 16 bytes long. The default is the 16 byte variant which can be overridden with either -d sat,12 or -d sat,16.

However the most interesting part is this comment (it's at the end of the initial comment block in scsiata.cpp, emphasis mine):

With more transports "hiding" SATA disks (and other S-ATAPI devices) behind a SCSI command set, accessing special features like SMART information becomes a challenge. The SAT standard offers ATA PASS THROUGH commands for special usages. Note that the SAT layer may be inside a generic OS layer (e.g. libata in linux), in a host adapter (HA or HBA) firmware, or somewhere on the interconnect between the host computer and the SATA devices (e.g. a RAID made of SATA disks and the RAID talks "SCSI" to the host computer). Note that in the latter case, this code does not solve the addressing issue (i.e. which SATA disk to address behind the logical SCSI (RAID) interface).

I think technically your WD My Book Duo is a RAID made of SATA disks that talks "SCSI" to the host computer, even if you use JBOD mode and allow your computer to see two separate disks. The above comment kinda explains your experience.

I wish someone finds a working approach for you. For now I wouldn't expect much from smartctl though (but maybe someone will prove me wrong). As the last resort you can physically detach one of the disks from WD My Book Duo and run SMART test(s) on the other disk. Since your setup is software RAID, I think you can temporarily move one disk to another enclosure in order to keep it operational and it won't interfere with the RAID.

1
  • Sadly, this seems to be the case.
    – Witiko
    Commented May 14, 2018 at 9:31
2

Smartctl works as expected as long as the device /dev/sda corresponds to one physical hard disk. However, RAID joins multiple physical disks logically together into one virtual disk. This can be seen by the fact that blkid shows the two disks as sharing one UUID and differing only by UUID_SUB.

For RAID, smartctl handles /dev/sda as a shorthand for the entire virtual disk which is the RAID array. It can still give the details for one disk, but it needs to be told about the RAID setup – the technology, the slot which houses the physical disk, and the Linux device corresponding to the virtual disk.

The syntax for referring to the first slot/disk :

smartctl -d <controller-type>,0 -t short /dev/sda

And similarly for the second slot/disk :

smartctl -d <controller-type>,1 -t short /dev/sda

Regarding controller-type, the smartmontools FAQ says :

Can I monitor disks behind RAID controllers?

Support for disks behind RAID controllers is highly dependent on both platform and controller type. See our page about smartmontools RAID controller support for the details.

The supported types from the smartmontools wiki are :

image

From your /proc/devices file, it seems that your controller is the Metadisk (RAID) device (md), which is not not supported by smartctl. So smartctl cannot be used on your computer to monitor the disks behind your RAID controller.

15
  • @KamilMaciorowski The WD My Book Duo drive enclosure hosts two SATA WD Red drives, but the enclosure itself is connected to the host via SCSI over USB. Therefore, the command set is SCSI to ATA Translation (SAT).
    – Witiko
    Commented May 12, 2018 at 20:30
  • @harrymc Your link discusses hardware RAIDs. My two drives are in a software RAID, i.e. /dev/sda corresponds to a single physical device connected over SAT. The filesystem UUID of /dev/sda is identical to the filesystem UUID of /dev/sdb, because the filesystem superblock contains information for Linux to be able to assemble the two drives into an array automatically. Unlike with hardware RAID (MegaRAID), there should be no need to alter the command set (-d) with software RAID and, apart from the filesystem UUIDs, smartctl should not really care about it.
    – Witiko
    Commented May 12, 2018 at 20:44
  • Actually, is there a reason for smartctl to care about UUIDs at all? UUIDs are assigned to filesystems, I assumed smartctl worked at the block device level.
    – Witiko
    Commented May 12, 2018 at 20:49
  • I only used sat because you did, megaraid seems more correct but other values may still be possible. Syntax differences with smartctl versions are possible, but the ability to to specify RAID slots must still be there. I also found out that slot numbers start from 0. I think that identical UUIDs are only a mark of RAID membership which smartctl does care about (of course this behavior could also be a bug).
    – harrymc
    Commented May 13, 2018 at 6:51
  • sat should be the correct value, since that is how the drive enclosure is connected to the host.
    – Witiko
    Commented May 13, 2018 at 11:59

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .