0

I set up three machines as high-availability storage. I use two 10Gb Ethernet interfaces on each machine to create a fully connected network between the three of them and without using any switch. The physical topology, including interface names, resemble the one shown below:

    |-------------|        |-------------|    
    |             |        |             |    
|----eno4  A  eno5----------eno4  C  eno5----|
|   |             |        |             |   |
|   |-------------|        |-------------|   |
|                                            |
|--------------------------------|           |
                                 |           |
               |-------------|   |           |
               |             |   |           |
           |----eno4  B  eno5----|           |
           |   |             |               |
           |   |-------------|               |
           |                                 |
           |---------------------------------|

In all machines A, B and C, vlan interfaces are derived from eno4 and eno5 with 2 vlan IDs set to 4 and 6. vlan interfaces are set as members of two Linux bridges br0 and br1, which are configured as follows:

/etc/network/interface:

auto eno4.4
iface eno4.4 inet manual
        post-up sysctl -w net.ipv6.conf.eno4/4.autoconf=0
        post-up sysctl -w net.ipv6.conf.all.autoconf=0
        post-up sysctl -w net.ipv6.conf.eno4/4.disable_ipv6=1
        post-up sysctl -w net.ipv6.conf.all.disable_ipv6=1
#vlan sync

auto eno4.6
iface eno4.6 inet manual
        post-up ip link set dev eno4.6 mtu 9710
        post-up sysctl -w net.ipv6.conf.eno4/6.autoconf=0
        post-up sysctl -w net.ipv6.conf.all.autoconf=0
        post-up sysctl -w net.ipv4.conf.eno4/6.send_redirects=0
        post-up sysctl -w net.ipv4.conf.all.send_redirects=0
        post-up sysctl -w net.ipv6.conf.eno4/6.disable_ipv6=1
        post-up sysctl -w net.ipv6.conf.all.disable_ipv6=1
#vlan sync

#auto eno4.7
#iface eno4.7 inet manual
##vlan sync

auto eno5.4
iface eno5.4 inet manual
        post-up sysctl -w net.ipv6.conf.eno5/4.autoconf=0
        post-up sysctl -w net.ipv6.conf.all.autoconf=0
#vlan sync

auto eno5.6
iface eno5.6 inet manual
        post-up ip link set dev eno5.6 mtu 9710
        post-up sysctl -w net.ipv6.conf.eno5/6.autoconf=0
        post-up sysctl -w net.ipv6.conf.all.autoconf=0
        post-up sysctl -w net.ipv4.conf.eno5/6.send_redirects=0
        post-up sysctl -w net.ipv4.conf.all.send_redirects=0
        post-up sysctl -w net.ipv6.conf.eno5/6.disable_ipv6=1
        post-up sysctl -w net.ipv6.conf.all.disable_ipv6=1
#vlan sync


auto br0
iface br0 inet static
        address 192.168.88.1/21
        bridge-ports eno4.4 eno5.4
        bridge-stp on
        bridge-fd 0

auto br1
iface br1 inet manual
        address 192.168.64.1/32
        address 192.168.68.1/32
        bridge-ports eno4.6 eno5.6
        hwaddress aa:bb:cc:dd:ee:ff
        post-up ip link set dev br1 mtu 9710
        post-up sysctl -w net.ipv6.conf.br1.disable_ipv6=1
        post-up sysctl -w net.ipv6.conf.all.disable_ipv6=1
        post-up sysctl -w net.ipv6.conf.br1.autoconf=0
        post-up sysctl -w net.ipv6.conf.all.autoconf=0
        post-up ip link set dev vmbr6 arp off
        post-up arp -i vmbr6 -s 192.168.64.2 ff:ee:dd:cc:bb:aa
        post-up arp -i vmbr6 -s 192.168.68.2 ff:ee:dd:cc:bb:aa
        post-up arp -i vmbr6 -s 192.168.64.3 ff:aa:dd:bb:ee:cc
        post-up arp -i vmbr6 -s 192.168.68.3 ff:aa:dd:bb:ee:cc
        post-up bridge fdb add ac:1f:6c:6f:e9:e4 dev eno4.6 master static
        post-up bridge fdb add ac:1f:6c:6f:e8:e0 dev eno5.6 master static
        post-up bridge link set dev eno4.6 flood off learning off
        post-up bridge link set dev eno5.6 flood off learning off
        post-up sysctl -w net.ipv4.conf.br1.send_redirects=0
        post-up sysctl -w net.ipv4.conf.all.send_redirects=0
        post-up ip route add 192.168.64.2 dev vmbr6
        post-up ip route add 192.168.64.3 dev vmbr6
        post-up ip route add 192.168.68.2 via 192.168.64.3 dev br1
        post-up ip route add 192.168.68.3 via 192.168.64.2 dev br1
        post-up systemctl restart nftables
        bridge-stp off
        bridge-fd 0

To sum up, IPv6 autoconfiguration is deactivated, br0 run over the physical schema at the top of this question with STF enabled over IP network 192.168.88.0/21. Bridge br1 is set to not flood the network and not learn from incoming frames. Also, STP and ARP are deactivated. Instead, "routing/switching" tables needed by the bridge and the IP protocol are manually set when the bridge is setup and no broadcast is necessary to use the network, which would result in a L2 storm. This allows setting up two IPs networks: one (192.168.0.0) sends packets directly from a host to another using a single Ethernet link, the second (192.168.1.0) send packets indirectly from a host to another, i.e., via the third host. This means that not only if 1 Ethernet link is down, any pair of hosts can still contact another (High Availability) but also if all links are working, there are two 10Gb links between any pair of hosts (Aggregation). For the indirect IP network to work, all nodes also need the nft rules below:

/etc/nftables.conf:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
        chain input {
                type filter hook input priority filter;
        }
        chain forward {
                type filter hook forward priority filter;
        }
        chain output {
                type filter hook output priority filter;
        }
}

table ip iptable {
        chain forward {
                type filter hook forward priority -300; policy accept;
                iif != br drop
        }

        chain prerouting {
                type filter hook prerouting priority -300; policy accept;
                ip daddr 192.168.1.2 ip daddr set 192.168.0.2 ip saddr set 192.168.1.3 accept
                ip daddr 192.168.1.3 ip daddr set 192.168.0.3 ip saddr set 192.168.1.2 accept
        }
}

On top of the iSCSI setup I run over this topology (see this question managed by pacemaker and corosync, I run a Proxmox cluster of 3 nodes on top of IP network 192.168.88.0/21 over br0.

I find that if I reboot one of the nodes, the interfaces are very unstable and pings from a node to another via 192.168.64.0/22 or 192.168.68.0/22 begin to fail without ever recovering until I restart all interfaces (eno4 and eno5), bridges (br0 and br1) as well as vlan-tagged interfaces (eno4.4, eno4.6, eno5.4, eno5.6) using ifdown and ifup. Quite often, the connection drops again more or less quickly anyway. When pings fail via 192.168.64.0/22 or 192.168.68.0/22, they still work as expected via 192.168.88.0/21. It tends to be better when I shutdown services corosync and pve-cluster on all nodes, then restart all their network interfaces, but I cannot reliably reproduce this "fix". I could not find useful info on what is happening in the logs that would help me identify the issue, but it feels it has to do with pacemaker. Does anyone have a clue on where to look?

0

You must log in to answer this question.

Browse other questions tagged .