1

I can netboot Ubuntu 20.04 on a Raspberry Pi 4b over NFS, using a Synology DS 1618+ as a TFTP and NFS server. But I would like to protect the root file system with overlayroot so several machines can run simultaneously from the same root. This just will not work for me. I asked a question a few days ago on Ask Ubuntu but haven't really received any useful insights. I also understand more about the problem now and would like to rephrase the question here for a broader audience.

The Problem

Although I can netboot Ubuntu 20.04 on an RPi4 using NFS, as soon as I enable overlayroot (overlayroot="tmpfs:recurse=0") the system starts up in a degraded state (systemctl is-system-running). It seems clear that this has something to do with overlayroot.

In this state, only root can log in. No other users get past the login/password prompt.

Examination of syslog reveals that the first thing to go wrong during boot is the startup of system-networkd, which fails with the message ("Operation not supported"). Closer examination reveals that systemd-networkd tries to run as a user (system-network). Since system-netwworkd doesn't start, neither do a number of other services:

  UNIT                      LOAD   ACTIVE SUB    DESCRIPTION                   
● atd.service               loaded failed failed Deferred execution scheduler  
● avahi-daemon.service      loaded failed failed Avahi mDNS/DNS-SD Stack       
● systemd-networkd.service  loaded failed failed Network Service               
● systemd-resolved.service  loaded failed failed Network Name Resolution       
● systemd-timesyncd.service loaded failed failed Network Time Synchronization  
● systemd-networkd.socket   loaded failed failed Network Service Netlink Socket

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

6 loaded units listed.

Configuration

DHCPd

I have a pair of ISC DHCP servers which have been running well for over a year. I made the following alterations to their configuration to support network booting:

# DHCP configuration for PXE boot of RPI4

option tftp-server-name "192.168.8.20"; #option 66
option bootfile-name "bootcode.bin"; #option 67

option vendor-class-identifier code 60 = string;
option vendor-encapsulated-options code 43 = string;
option space RPi code width 1 length width 1;
option RPi.discovery code 6 = unsigned integer 8;
option RPi.menu-prompt code 10 = text;
option RPi.menu-item code 9 = text;

option vendor-class-identifier "PXEClient";
option vendor-encapsulated-options "Raspberry Pi Boot";
vendor-option-space RPi;
option RPi.discovery 3;
option RPi.menu-prompt "PXE";
option RPi.menu-item "Raspberry Pi Boot";

filename "pxelinux.0";

next-server 192.168.8.20;
option tftp-server-address 192.168.8.20;

I will admit that I don't entirely understand this, but it works :-) I'm sure there's stuff in there that the RPi4 doesn't use.

Raspberry Pi 4b

I have set up the RPi4 boot loader as follows:

$ sudo rpi-eeprom-config 
[all]
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=1
BOOT_ORDER=0xf21
TFTP_PREFIX=1
TFTP_PREFIX_STR=RPi4-Ubuntu/

Synology DS 1618+ NAS

TFTP Server

At startup the RPi4 sends the TFTP_PREFIX_STR above and receives the following files back from the TFTP server on the Synology NAS:

bcm2710-rpi-2-b.dtb       bootcode.bin  initrd.img
bcm2710-rpi-3-b.dtb       boot.scr      overlay_map.dtb
bcm2710-rpi-3-b-plus.dtb  cmdline.txt   overlays
bcm2710-rpi-cm3.dtb       config.txt    start4cd.elf
bcm2710-rpi-zero-2.dtb    fixup4cd.dat  start4db.elf
bcm2711-rpi-400.dtb       fixup4.dat    start4.elf
bcm2711-rpi-4-b.dtb       fixup4db.dat  start4x.elf
bcm2711-rpi-cm4.dtb       fixup4x.dat   start_cd.elf
bcm2837-rpi-3-a-plus.dtb  fixup_cd.dat  start_db.elf
bcm2837-rpi-3-b.dtb       fixup.dat     start.elf
bcm2837-rpi-3-b-plus.dtb  fixup_db.dat  start_x.elf
bcm2837-rpi-cm3-io3.dtb   fixup_x.dat   vmlinuz

These are from the Ubuntu 20.04 installation image for RPi (partition boot). I have made the following modifications to config.txt:

[pi4]
# Run as fast as firmware / board allows
arm_boost=1

[all]
arm_64bit=1
device_tree_address=0x03000000
enable_uart=1
cmdline=cmdline.txt
kernel=vmlinuz
initramfs initrd.img followkernel

include syscfg.txt
include usercfg.txt

and cmdline.txt:

dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=/dev/nfs nfsroot=192.168.8.20:/volume3/pxe/nfs/RPi4-Ubuntu/OS,tcp,rw ip=dhcp rootfstype=nfs elevator=deadline rootwait

NFS Server

The NFS server serves up the nfsroot mentioned in cmdline.txt as the root (/). This is basically the rootfs partition from the Ubuntu distro. It also serves up the files from the Ubuntu distro's boot partition, which are mounted at /boot.

exports -v shows the following:

/volume3/pxe    192.168.0.0/21(rw,async,no_wdelay,hide,crossmnt,no_subtree_check,insecure_locks,anonuid=1024,anongid=100,sec=sys,insecure,root_squash,no_all_squash)
/volume3/pxe    192.168.8.0/21(rw,async,no_wdelay,hide,crossmnt,no_subtree_check,insecure_locks,anonuid=1024,anongid=100,sec=sys,insecure,root_squash,no_all_squash)
/volume3/pxe    192.168.32.0/21(rw,async,no_wdelay,hide,crossmnt,no_subtree_check,insecure_locks,anonuid=1024,anongid=100,sec=sys,insecure,root_squash,no_all_squash)
/volume3/pxe    192.168.72.0/21(rw,async,no_wdelay,hide,crossmnt,no_subtree_check,insecure_locks,anonuid=1024,anongid=100,sec=sys,insecure,root_squash,no_all_squash)

During the boot process, it seems that Synology NFS and the RPi4 will only use NFSv3. In /etc/fstab, however, they both seem happy to use NFSv4/NFSv4.1.

Back to the Pi

With this constellation the RPi boots and runs well over the net with an NFS file system as root. However, at this point, wishing to protect the root file system from being written to, I proceed to modify the configuration file of overlayroot at /etc/overlayroot.conf so it looks like the following:

overlayroot_cfgdisk="disabled"
#overlayroot=""
overlayroot="tmpfs:recurse=0,debug=1"

(I have omitted the 160+ lines of comment before this.)

When I reboot the problem described above ensues.

Help! Please?

I have tried the above with 32-bit and 64-bit Ubuntu 22.04 with exactly the same result. I tried Raspberry OS 11 (64-bit), but it doesn't offer overlayroot as an option. Its alternative, built into raspi-config, doesn't work with NFS. So I didn't pursue that.

I greatly simplified the TFTP configuration for Ubuntu, however, based on what Raspberry OS does.

I wouldn't be surprised if the problem had something to do with the Synology NFS server, but I don't really know where to start looking.

I certainly don't preclude the idea that I have made a mistake somewhere, so I'd be grateful if anyone could point any such out.

As soon as Ubuntu 22.04 comes out, I'll try that too.

There are lots of people out in the Net who have talked about doing this. I've read a lot of articles and tried to take their messages onboard. I've seen one or two allusions to possible problems with NFS root file systems and overlayroot. But I've not seen anything that describes exactly what I'm trying to do.

4
  • I’m voting to close this question because question was cross-posted on Ask Ubuntu. Commented Nov 20, 2023 at 18:53
  • It might be a better idea to close the original post on Ask Ubuntu, because there seems to be more action here. Commented Nov 21, 2023 at 9:17
  • As the person who posted the questions, you're able to close/delete any for reasons you choose. The community can only vote/suggest, and it requires several of us to agree before an action is taken. Commented Nov 21, 2023 at 15:32
  • Thanks for the clarification. I cross-posted here because Ask Ubuntu produced no useful reaction. I'm not sure what the etiquette is in such situations. Perhaps I should go and close the Ask Ubuntu thread? Commented Nov 22, 2023 at 16:30

2 Answers 2

2

The answer is relatively simple, though it took a long time to find it.

If you netboot Ubuntu with NFS on an RPi4 the nfsmount included with klibc (as delivered by Ubuntu) only supports NFS v2 and v3. If you replace that nfsmount with one that supports NFSv4 and your root filesystem is successfully mounted by NFSv4, overlayroot works as it is supposed to.

Spoiler: it is an upstream (i.e. Debian) bug that has been around since 2007. Repeat: 2007.

7
  • This is a copy of the answer I gave to a similar question on Ask Ubuntu. Commented Aug 14, 2022 at 17:19
  • So I ended up here with very similar symptoms. Thanks @Stephen Winnall for such a nice writeup. Just like what you were seeing, disabling overlayroot fixes things, and with overlayroot only root can login, etc. The extra interesting wrinkle is that this issue only happens when the NFS share is on a TrueNAS machine. With the NFS share on an Ubuntu box, overlayroot works just fine. Do you have any insight into why overlayroot has issues when NFS v2/v3 are used? Commented Nov 16, 2023 at 21:50
  • @aggieNick02 Sorry, I was fixated on making NFSv4 work. I remember reading something about ACLs, but I couldn’t get my head round it :-) Commented Nov 18, 2023 at 11:35
  • Totally get that, I still don't have a full understanding of it either; the whole thing definitely seems scarily brittle. And in different directions too. You fixed your NFSv3 problems by moving to NFSv4.The other person I linked to had no problems with NFSv3, and hit problems when migrating to NFSv4, with the solution being to disable ACL stuff. Yet what I'm seeing is problems on NFSv3 precisely when ACL stuff is not available. And then NFSv2, which has no optional ACL stuff at all, worked fine for me. It definitely doesn't inspire confidence. ;-) Commented Nov 20, 2023 at 5:50
  • I did look at the source of the nfsmount that was delivered with Ubuntu. It was a jumble of alternative actions depending on the version of NFS (<=3) in use. I just replaced it with a shell script that took the same arguments and then called them with mount -t nfs. The choice of NFSv4 is dictated by the arguments passed to my nfsmount. If it were passed a request for NFSv2 or NFSv3 it would at least try to respect that, but I have never tried it. Your symptoms might be down to nfsmount doing something weird. Why would nfsmount need to do anything different depending on NFS version? Commented Nov 22, 2023 at 16:40
0

Copied from the same question over at askubuntu:

The root issue appears to be that overlayfs behaves incorrectly in certain NFS server configurations. Namely, something goes awry when overlayfs is used on top of an NFSv3 export on a server that does not implement the optional NFSACL protocol extensions for v3. It looks like this isn't the only case of overlayfs being very brittle with respect to ACLs. Apparently there have been (still are?) issues with both v3 and v4 NFS.

I was able to simplify the problem to one not involving boot, but instead just overlayfs and NFS exports. With an NFS export of an Ubuntu image served from an Ubuntu machine (which does implement NFSACLv3) and a directory named /tmp/overlaygames/ with empty upper,work, and overlay directories within, the following script would run without error:

#!/bin/bash
sudo mount -t nfs -o ro,vers=3 10.99.0.1:/srv/nfs/ubuntu-20.04.3 /media/nfs/
sudo mount -t overlay -o lowerdir=/media/nfs,upperdir=/tmp/overlaygames/upper,workdir=/tmp/overlaygames/workdir/ overlay /tmp/overlaygames/overlay
ls /tmp/overlaygames/overlay/home

Now after running that, run this script to unmount and cleanup:

#!/bin/bash
pushd /tmp/overlaygames
sudo umount overlay
rm -rf workdir
mkdir workdir
sudo umount /media/nfs
popd

Now running the exact same script, but disabling NFSACLv3 client side with the noacl option:

#!/bin/bash
sudo mount -t nfs -o ro,noacl,vers=3 10.99.0.1:/srv/nfs/ubuntu-20.04.3 /media/nfs/
sudo mount -t overlay -o lowerdir=/media/nfs,upperdir=/tmp/overlaygames/upper,workdir=/tmp/overlaygames/workdir/ overlay /tmp/overlaygames/overlay
ls /tmp/overlaygames/overlay/home

will return the familiar

ls: cannot open directory '/tmp/overlaygames/overlay/home': Operation not supported

Likewise, starting with the first script but putting the NFS export on a FreeNAS/TrueNAS (FreeBSD) machine, will also return Operation not supported because FreeBSD does not implement NFSACLv3 (verified by capturing packets).

Interestingly, specifying vers=2 when working with an NFS share on FreeBSD appears to work just fine. Granted, NFSv2 has some limitations compared to NFSv3.

1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .