High-available NFS service built with two shared-nothing storage nodes?

Question

I have two servers with 6 identical 4TB disks each. Now I'd like to setup a high-available NFS service using them.

The setup I'd have in mind:

Both servers have a mdadm RAID6 with LVM on top => each server should have some /dev/mapper/vg0-drbd1 partition of size 16TB
Mirror /dev/mapper/vg0-drbd1 on server1 (active) to /dev/mapper/vg0-drbd1 on server2 (passive) using DRBD

This way I can ensure that:

Removing any disk from one server keeps the RAID available (although degraded) due to RAID6
When shutting down the active server there is still the passive server accessible

Please correct me if I'm wrong or there is some better solution (e.g. integrated to ZFS?).

My question now is, how do I configure NFS to switch to the passive server when shutting down the active one?

Eugen Rieck · Accepted Answer · 2021-01-13 18:17:53Z

Assuming you will use pacemaker to fail over the DRBD active role and file system mount, all you need to add are a few cluster resources:

another (tiny) DRBD carrying /var/lib/nfs
a floating IP address
colocation restraints to make sure, that both DRBD active roles, both mounts, the IP address and the NFS service live on the same node
serialization constraints to make sure, that the NFS service starts up only after all other resources are up

You now let all clients do their NFS mounts to the floating IP address. If the primary server fails, NFS operations will just stall until the failover is done (which should be rather fast) and then continue. I use just such a setup to service a broadcast industry deployment center, each pair has 2x12x8TB disks and it just works. I routinely do rolling upgrades on each pair an the clients never lose a beat.

Some random hints:

Make sure you do NOT enable async write on the servers or you WILL lose data on failover
Make sure all timeout mount options are in line with failover time
Test before putting load on it - this includes a hard-power-off failover test
Definitly use a multi-ring corosync setup, you do NOT want split-brain
Definitly use dedicated bonded interfaces for DRBD, again you do NOT want split-brain or random disconnects

Some snipplets from `crm config show':

primitive drbd0_res ocf:linbit:drbd \
        params drbd_resource=r0 \
        op monitor interval=103s timeout=120s role=Master \
        op monitor interval=105s timeout=120s role=Slave \
        op start timeout=240s interval=0 \
        op stop timeout=180s interval=0 \
        op notify timeout=120 interval=0
primitive drbd1_res ocf:linbit:drbd \
        params drbd_resource=r1 \
        op monitor interval=103s timeout=120s role=Master \
        op monitor interval=105s timeout=120s role=Slave \
        op start timeout=240s interval=0 \
        op stop timeout=180s interval=0 \
        op notify timeout=120 interval=0
primitive ip_nfs_res IPaddr2 \
        params ip=192.168.10.103 cidr_netmask=24 nic=eno1 \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
primitive nfs_res service:nfs-kernel-server \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
primitive nfs_fs_res Filesystem \
        params device="/dev/drbd0" directory="/srv/nfs" fstype=ext4 \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
primitive varlibnfs_fs_res Filesystem \
        params device="/dev/drbd1" directory="/var/lib/nfs" fstype=ext4 \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
ms drbd0_ms drbd0_res \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started
ms drbd1_ms drbd1_res \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started
colocation drbd_colo inf: drbd0_ms:Master drbd1_ms:Master
colocation drbd_nfs_fs_colo inf: nfs_fs_res drbd0_ms:Master
colocation drbd_varlibnfs_fs_colo inf: varlibnfs_fs_res drbd1_ms:Master
colocation nfs_ip_fs_colo inf: ip_nfs_res nfs_fs_res
colocation nfs_ip_service_colo inf: ip_nfs_res nfs_res
order drbd_nfs_fs_order Mandatory: drbd0_ms:promote nfs_fs_res:start
order drbd_nfs_fs_serialize Serialize: drbd0_ms:promote nfs_fs_res:start
order drbd_varlibnfs_fs_order Mandatory: drbd1_ms:promote varlibnfs_fs_res:start
order drbd_varlibnfs_fs_serialize Serialize: drbd1_ms:promote varlibnfs_fs_res:start
order nfs_fs_ip_order Mandatory: nfs_fs_res:start ip_nfs_res:start
order nfs_fs_ip_order Serialize: nfs_fs_res:start ip_nfs_res:start
order varlibnfs_fs_ip_order Mandatory: varlibnfs_fs_res:start ip_nfs_res:start
order varlibnfs_fs_ip_order Serialize: varlibnfs_fs_res:start ip_nfs_res:start
order ip_service_order Mandatory: ip_nfs_res:start nfs_res:start
order ip_service_order Serialize: ip_nfs_res:start nfs_res:start

Thanks a lot for your answer, it contains many valuable information. May I ask you whether you could share some notes on how you set up this cluster? Also, would you use a dedicated direct LAN connection between both servers for DRBD? — Hoeze, Commented Jan 13, 2021 at 16:34
Dedicated LAN: YES!!!! (I tend to use a 10G and a 1G connection in active-passive bonding, 10G back-to-back is very cheap) — Eugen Rieck, Commented Jan 13, 2021 at 18:02
Notes: I wrote basically everything: Master/Slave set for DRBD, Resources for File system mount, IP and NFS service, Colocation (infinity) for all, Order (serialize) for DRBD-master->FS-mount(2x)->IP->NFS — Eugen Rieck, Commented Jan 13, 2021 at 18:05
Don't forget: Use a dedicated link for corosync: I tend to use a dedicated 1G link as primary and the DRBD link as secondary — Eugen Rieck, Commented Jan 13, 2021 at 18:19

Stack Exchange Network

High-available NFS service built with two shared-nothing storage nodes?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linux
backup
raid
nfs
.

Hot Network Questions

High-available NFS service built with two shared-nothing storage nodes?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linuxbackupraidnfs.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linux
backup
raid
nfs
.