I have a Debian Linux server with 4x 1Gb NICs, 2 onboard and 2 PCI. All NICs are configured into a single mode 4 LACP bonded interface: bond0
I have two switches: Cisco SG300-28 and Cisco SG300-10, to be referred to as A and B respectively.
The two switches are connected to each other by an LACP LAG on 2 switch ports, both links listed as active.
All ports on both switches appear to be configured as trunk ports. This appears to be the default as I did a factory reset before testing this. Would that make any difference? There is only a single default VLAN at this stage.
The server has 2 NICs into switch A and 2 NICs into switch B (one onboard NIC and one PCI). The Linux bonding driver is clever in this respect as it works out an Aggregator ID
for each interface and pairs them up by switch, so only the links into one switch are ever active even though all 4 might be up.
I have a workstation that I am testing this from, currently connected to switch A.
----------------
===========| Server bond0 |===========
|| ---------------- ||
|| ||
|| ||
---------------- ----------------
| Switch A |=======LAG========| Switch B |
---------------- ----------------
|
|
|
workstation
Initially the server reports it's using the Aggregator ID associated with Switch A. I get a solid ping from the workstation.
I disconnect the 2 cables from Server to Switch A, Linux switches active Aggregator to the NICs connected to switch B, the ping continues on and remains stable. The active path at this point is workstation -> Switch A -> inter-switch LAG -> Switch B -> Server
When I reconnect the cables from the server into Switch A, Linux keeps the aggregate ID as it is so still using Switch B.
Pings from workstation start being dropped as follows:
As soon as I disconnect Switch A again, ping returns to being solid.
So it fails when ping is originating from a port on Switch A going to the server Switch B but only when the server's links to Switch A are up but not active in the OS.
This is repeatable.
I have run a tcpdump
on the Server and workstation. I can see ALL pings being transmitted from the workstation but only some of them getting a reply, per the trace above. Running a tcpdump
on the Server and it looks like the missing pings aren't making it that far. So they are being dropped somewhere in the switching.
If I reverse this broken setup to the other switch, plugging workstation into Switch B, so traffic path is... workstation -> Swtich B -> inter-switch LAG -> Switch A -> Server
then it works fine.
I did think this might be some sort of STP issue with the port getting blocked, but the ping drop pattern is too frequent with almost every other pair of pings being dropped. Checking the log on the switch and it didn't look like any ports were being blocked on either switch.
I have also tried replacing the inter-switch LAG with a single connection, non-LAG/LACP.
I have confirmed that the LACP settings match on all sides.
As a full-time sysadmin and only a networking part-timer/amateur, to me this points to some sort of difference in configuration between the switches. But I don't know what parts of config to check for differences. They are running different firmware versions and note that these are the small business SG300 series, so not running full iOS but do have what looks like a reasonably featured CLI.
My limited networking knowledge tells me it's something like an ARP issue. The server should only be presenting the MAC address to the active pair/switch. The dropped pings are possibly trying to be routed to the non-active switch/pair.
But how could I prove that with these switches?
I would have expected longer runs of successful and failed pings though.
My next step is to do some tcpdumps to look at the ARPs and LACPDUs to see if there's a sort of "storm" going on causing the traffic to switch between switches every couple of seconds. Though from the Linux perspective, there's no change in active Aggregator ID
corresponding with the failed pings.
Does anyone else have any suggestions of what else to look at here?
EDIT: Adding RSTP status for the port-channels...
SwitchA#sh spanning-tree
Spanning tree enabled mode RSTP
Default port cost method: long
Root ID Priority 32768
Address 0c:f5:a4:c2:0e:bf
This switch is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Number of topology changes 12 last change occurred 20:18:37 ago
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Interfaces
Name State Prio.Nbr Cost Sts Role PortFast Type
--------- -------- --------- -------- ------ ---- -------- -----------------
...
Po1 enabled 128.1000 20000 Frw Desg Yes P2P (RSTP)
Po2 enabled 128.1001 20000 Dsbl Dsbl No -
Po3 enabled 128.1002 20000 Dsbl Dsbl No -
Po4 enabled 128.1003 20000 Dsbl Dsbl No -
Po5 enabled 128.1004 20000 Dsbl Dsbl No -
Po6 enabled 128.1005 20000 Dsbl Dsbl No -
Po7 enabled 128.1006 20000 Dsbl Dsbl No -
Po8 enabled 128.1007 20000 Frw Desg No P2P (RSTP)
SwitchB#sh spanning-tree
Spanning tree enabled mode RSTP
Default port cost method: long
Loopback guard: Disabled
Root ID Priority 32768
Address 0c:f5:a4:c2:0e:bf
Cost 20000
Port Po8
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 32768
Address 1c:de:a7:75:1a:4b
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Number of topology changes 5 last change occurred 20:26:02 ago
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Interfaces
Name State Prio.Nbr Cost Sts Role PortFast Type
--------- -------- --------- -------- ------ ---- -------- -----------------
...
Po1 enabled 128.1000 20000 Frw Desg Yes P2P (RSTP)
Po2 enabled 128.1001 20000 Dsbl Dsbl No -
Po3 enabled 128.1002 20000 Dsbl Dsbl No -
Po4 enabled 128.1003 20000 Dsbl Dsbl No -
Po5 enabled 128.1004 20000 Dsbl Dsbl No -
Po6 enabled 128.1005 20000 Dsbl Dsbl No -
Po7 enabled 128.1006 20000 Dsbl Dsbl No -
Po8 enabled 128.1007 20000 Frw Root No P2P (RSTP)
Po8 is the inter-switch LAG and Po1 is the server LAG, in both cases.
No topology changes recorded on either switch while I've got things in their broken state (pings dropping).