2

I have configured a Linux system to mount a volume provided through iSCSI, from a systemd mount unit that gives the Open-iSCSI service as a required and previous unit (i.e. Requires=iscsi.service and After=iscsi.service under section [Unit]).

The volume spans several logical units, and is identified for mounting by UUID.

Unfortunately, the mount operation is inconsistently successful at boot.

I have determined that during some boot sequences, at least one device is not yet attached at the time of the attempted mount operation. Thus emerges a race condition.

Based on various documentation and experiences, the suggestion is strong that the included dependency should ensure waiting until all devices are attached before the mount operation is attempted.

Is a further target available that would ensure that all devices have been attached, before the mount operation is attempted?

The system runs Linux Mint 21.2 with kernel 6.2.0.

3 Answers 3

1

First of all, the upstream open-iscsi client/initiator systemd service does not call iscsiadm such that it waits until all login requests are successful before it exits (Ref.: [1] [2]).

So you can indeed try to "minimize" the chance that you bump into the race condition (but not really eliminate the race itself) by overriding the ExecStart= command with a service snippet (see systemctl edit) so that it does not call iscsiadm with -W.

The thing is, even when iscsiadm does wait until all logins have been completed, AFAICT it does not mean that it will (or even can) wait until the generic SCSI disk driver has done probing all the drives that was just populated / virtualized in the system. (It's like the time you fully plugged a USB drive in isn't really the time the kernel has done probing it.)

So the "real" way to make sure that all the devices of a multi-device BTRFS volume are there and ready such that the volume can be mounted, is to check whether the btrfs filesystem driver has done scanning / registering them.

One way is to have a script like this and a service that is pulled by (or, perhaps even better, pulls) the corresponding mount unit and runs the script:

#!/bin/sh
while true; do
  if [ -L /dev/disk/by-uuid/"$1" ] &&
     btrfs device ready /dev/disk/by-uuid/"$1"; then
    break
  fi
done

($1 should be set to a UUID in small letters. You may even write a systemd service template and passed the service argument of each enabled instance as the script argument, although you probably need to check out the escaping rules for hyphens.)

Certainly you will need to make sure the ready-script service is ordered before the mount unit. Also make sure you use the correct Type=. (I think oneshot should be correct / fine.)

Also, you may want to confirm that your system creates symlinks under /dev/disk/by-uuid/ first.

If you want to pulls the mount unit with the ready-script service, I suppose you can in turn pull it with the iscsi service.

P.S. btrfs device ready will not return 1 if a device of a multi-device volume was scanned but is now gone (unplugged or whatever) until you run btrfs device scan -u. I think it's worth mentioning because the behavior could make you thought that the ready subcommand is broken / useless (like I did) when you want to test it out first on a was-once-ready volume.

2
  • 1
    Normally 64-btrfs.rules sets SYSTEMD_READY=0 when it detects an incomplete multidevice array, so it should in theory be enough for systemd to wait on dev-foo.device – the device unit won't show up as active until an udev rule marks it as SYSTEMD_READY=1 again. Commented Aug 28, 2023 at 9:52
  • @u1686_grawity yes, just realized it an have written another answer
    – Tom Yan
    Commented Aug 28, 2023 at 10:07
1

While logically speaking, there is not exactly anything wrong in my other answer or my comments, but I did missed the obvious (well, not so obvious).

By default systemd makes sure that it waits until the corresponding device unit is up and ready before it starts the mount unit, and there is udev rule that makes sure that it will be the case for a btrfs (multi-device or not) volume as well.

Unfortunately, there's an exception. While when you use e.g. UUID=... for the first field in an fstab entry, the fstab generator (which "translates" fstab entries to mount units) will translate UUID=... in to a /dev/disk/by-uuid/... device path, if you use UUID=... in a manually written mount unit, systemd does NOT do that when it loads the unit file. (Yet that mount unit is NOT considered invalid either. systemd will still start it as long as it is pulled.)

The consequence for What= being e.g. UUID=... instead of a device path (e.g. /dev/disk/by-uuid/...) is that, the mount unit will not have dependency on any device unit, and hence not ordered after any. Therefore, it will be started "ASAP". The result is that technically the mounting could either be attempted before the UUID is in the system, or before the btrfs volume is assembled, or after the volume is ready.

So there are two simple solutions:

  • use an fstab entry
  • use /dev/disk/by-uuid/... for What= in a "handwritten" mount unit.

There is no need and no point to add dependency / ordering against the iscsi service to the mount unit. (Regardless of how iscsiadm works / is requested to work, it's a bad idea anyway, since it could likely be incompatible with the default/implicit deps and ordering against target units or so. Even if it won't "mess up the head" of the systemd, you could be getting false expectation / assumption. Let alone the races at other levels of probing that at least technically exist.)

P.S. I've filed an issue upstream. Let's see how things will go.

0

First of all, Requires= does not imply After=. You usually want both.

Second, it depends on each individual service when that service reports "ready" to systemd. You need to verify whether iscsid implements readiness to mean it has started up and established the connections, or whether it has only started up (and is now ready to accept control commands).

Since iscsid creates virtual /dev/sd devices, it may make more sense to depend on the device and not just on the service. (In other words, depend on the actual effects and not on the mechanism.) Mount units normally do that already, though, but for services you'll want to depend on "dev-disk-bysomething.device".

1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .