Assuming you will use pacemaker to fail over the DRBD active role and file system mount, all you need to add are a few cluster resources:
- another (tiny) DRBD carrying
/var/lib/nfs
- a floating IP address
- colocation restraints to make sure, that both DRBD active roles, both mounts, the IP address and the NFS service live on the same node
- serialization constraints to make sure, that the NFS service starts up only after all other resources are up
You now let all clients do their NFS mounts to the floating IP address. If the primary server fails, NFS operations will just stall until the failover is done (which should be rather fast) and then continue. I use just such a setup to service a broadcast industry deployment center, each pair has 2x12x8TB disks and it just works. I routinely do rolling upgrades on each pair an the clients never lose a beat.
Some random hints:
- Make sure you do NOT enable async write on the servers or you WILL lose data on failover
- Make sure all timeout mount options are in line with failover time
- Test before putting load on it - this includes a hard-power-off failover test
- Definitly use a multi-ring corosync setup, you do NOT want split-brain
- Definitly use dedicated bonded interfaces for DRBD, again you do NOT want split-brain or random disconnects
Some snipplets from `crm config show':
primitive drbd0_res ocf:linbit:drbd \
params drbd_resource=r0 \
op monitor interval=103s timeout=120s role=Master \
op monitor interval=105s timeout=120s role=Slave \
op start timeout=240s interval=0 \
op stop timeout=180s interval=0 \
op notify timeout=120 interval=0
primitive drbd1_res ocf:linbit:drbd \
params drbd_resource=r1 \
op monitor interval=103s timeout=120s role=Master \
op monitor interval=105s timeout=120s role=Slave \
op start timeout=240s interval=0 \
op stop timeout=180s interval=0 \
op notify timeout=120 interval=0
primitive ip_nfs_res IPaddr2 \
params ip=192.168.10.103 cidr_netmask=24 nic=eno1 \
meta target-role=Started \
op start timeout=180s interval=0 \
op stop timeout=180s interval=0
primitive nfs_res service:nfs-kernel-server \
meta target-role=Started \
op start timeout=180s interval=0 \
op stop timeout=180s interval=0
primitive nfs_fs_res Filesystem \
params device="/dev/drbd0" directory="/srv/nfs" fstype=ext4 \
meta target-role=Started \
op start timeout=180s interval=0 \
op stop timeout=180s interval=0
primitive varlibnfs_fs_res Filesystem \
params device="/dev/drbd1" directory="/var/lib/nfs" fstype=ext4 \
meta target-role=Started \
op start timeout=180s interval=0 \
op stop timeout=180s interval=0
ms drbd0_ms drbd0_res \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started
ms drbd1_ms drbd1_res \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started
colocation drbd_colo inf: drbd0_ms:Master drbd1_ms:Master
colocation drbd_nfs_fs_colo inf: nfs_fs_res drbd0_ms:Master
colocation drbd_varlibnfs_fs_colo inf: varlibnfs_fs_res drbd1_ms:Master
colocation nfs_ip_fs_colo inf: ip_nfs_res nfs_fs_res
colocation nfs_ip_service_colo inf: ip_nfs_res nfs_res
order drbd_nfs_fs_order Mandatory: drbd0_ms:promote nfs_fs_res:start
order drbd_nfs_fs_serialize Serialize: drbd0_ms:promote nfs_fs_res:start
order drbd_varlibnfs_fs_order Mandatory: drbd1_ms:promote varlibnfs_fs_res:start
order drbd_varlibnfs_fs_serialize Serialize: drbd1_ms:promote varlibnfs_fs_res:start
order nfs_fs_ip_order Mandatory: nfs_fs_res:start ip_nfs_res:start
order nfs_fs_ip_order Serialize: nfs_fs_res:start ip_nfs_res:start
order varlibnfs_fs_ip_order Mandatory: varlibnfs_fs_res:start ip_nfs_res:start
order varlibnfs_fs_ip_order Serialize: varlibnfs_fs_res:start ip_nfs_res:start
order ip_service_order Mandatory: ip_nfs_res:start nfs_res:start
order ip_service_order Serialize: ip_nfs_res:start nfs_res:start