0

I'm currently facing a network issue between Cisco Nexus 93108 (9.3.11) on an LACP port-channel configuration.

The replica below help us to reproduce the issue we had (using 4x Nexus 9000v qcow2 image version 9.3.9) as you can see my network topology is between 2 datacenters.

  • SWC001,SWC002,SWC011 and SWC012 are Nexus 9000v.
  • Switch1 , Switch2 Switch3, OP and OP1 are basic GNS3 switch

Configuration on each is similar : 1 Domain vPC between 2 switches (pretty basic 1 peerlink, 1 peerkeepalive, no layer 3, no peer gateway) and a WAN link through a vPC Po 100 between both datacenter that allow ALL vlans to transit.

enter image description here

vPC configuration is consistent and works well: SWC001 is directly link to SWC011 and SWC002 to SWC012, everything runs smoothly and got no issues. The thing is, in reality, there's an ISP between both Datacenters and the pain is coming... We only know from the ISP they use QinQ configuration in their own network, as a datacenter client we don't know which configuration neither devices they're using.

Also, before using Nexus, we had old HP core switch and we didn't set any particular configuration regarding to QinQ) After this HP=>Nexus migration, everything was fine except the PortChannel 100 status (so the extended LAN between both DCs)

To simulate an ISP in between, i set up a basic GNS3 switch configured with QinQ on e0 and e1 (VLAN 1 and Ethertype 0x88A8)

In my example Po100 is configured with only one physical interface (eth1/13), and so the current configuration on all eth1/13 on the 4 switch is :

version 9.3(9) Bios:version

interface Ethernet1/13
  lacp rate fast
  switchport mode trunk
  spanning-tree port type edge trunk
  spanning-tree bpdufilter enable
  channel-group 100 mode active

interface port-channel100
  switchport mode trunk
  spanning-tree port type edge trunk
  spanning-tree bpdufilter enable
  no lacp suspend-individual

If we're on a channel-group active/active configuration (after enabled feature lacp ), PortChannel protocol is LACP but it still goes on Switched and Down

SWC001# sh port-channel summary interface port-channel 100
Flags:  D - Down        P - Up in port-channel (members)
        I - Individual  H - Hot-standby (LACP only)
        s - Suspended   r - Module-removed
        b - BFD Session Wait
        S - Switched    R - Routed
        U - Up (port-channel)
        p - Up in delay-lacp mode (member)
        M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port-       Type     Protocol  Member Ports
      Channel
--------------------------------------------------------------------------------
100   Po100(SD)   Eth      LACP      Eth1/13(I)

If we set "channel group 100 mode on" on both side, Port CHannel protocol is none BUT interface is UP and Po Switched/Up

FRHD01SWC001(config-if)# sh port-channel summary interface port-channel 100
Flags:  D - Down        P - Up in port-channel (members)
        I - Individual  H - Hot-standby (LACP only)
        s - Suspended   r - Module-removed
        b - BFD Session Wait
        S - Switched    R - Routed
        U - Up (port-channel)
        p - Up in delay-lacp mode (member)
        M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port-       Type     Protocol  Member Ports
      Channel
--------------------------------------------------------------------------------
100   Po100(SU)   Eth      NONE      Eth1/13(P)

Now, if we have a look about lacp counters, from both side there's LACPDUs sent, but 0 received from each way.

SWC001# show lacp counters
NOTE: Clear lacp counters to get accurate statistics

------------------------------------------------------------------------------
                             LACPDUs                      Markers/Resp LACPDUs
Port              Sent                Recv                  Recv Sent  Pkts Err
------------------------------------------------------------------------------
port-channel1
Ethernet1/11       28                   24                     0      0    0


port-channel100
Ethernet1/13       12                   0                      0      0    0

Finally, when i have a look at sh vpc brief , Po100 status is down whereas consistency is success

vPC Peer-link status
---------------------------------------------------------------------
id    Port   Status Active vlans
--    ----   ------ -------------------------------------------------
1     Po1    up     1,5,10-11,13-14,22-23,30,41,44,50-51,55,80-81,90,

                    100,110-114,120-121,130-135,140-141,150-154,

                    160-174,176,180,201-205,230,256-257,999


vPC status
----------------------------------------------------------------------------
Id    Port          Status Consistency Reason                Active vlans
--    ------------  ------ ----------- ------                ---------------

100   Po100         down*  success     success               -

We made different configuration on GNS3 to compare where the issue could be, active/active, active/passive or on/on , but we actually don't have any idea of the issue. We also have a look at the spanning tree part but it seems good. It seems that if anything is connected between 2x Nexus, it fails, even though link may be up (but there's no packet received)

Does anyone already have an issue like this ? with/without QinQ in between ?

EDIT 28-06-2023 - HP core switch working configuration, A24 and B24 are LACP interfaces between both DCs. enter image description here

7
  • 1
    Be careful with GNS3. It's an emulator and may operate differently from real hardware.
    – Ron Trunk
    Commented Jun 27, 2023 at 15:22
  • Hi, yes i know, unfortunately this is our only way to test it out of production ;) also i create this replica to be able to better explain my case here, else, it could be difficult
    – motorbass
    Commented Jun 27, 2023 at 15:45
  • I have not tried that kind of scenario in the past and would not expect to get good results with a 'black box' of an ISP network or other unknown/unknowable network in between the locations. Basically you want leased dark fiber, not an internet service between. That or you want a routed connection in between rather than layer 2. You can look into an overlay network (VXLAN) that can run encapsulated layer 2 over the layer 3 internet service if the layer 2 spanning design is critical. Commented Jun 27, 2023 at 18:14
  • @FrameHowitzer to be a bit more precise, our core network rack goes to the datacenter network rack (where multiple clients are hosted) and then the datacenter rack goes through dedicated lines to a provider. Having this, it allows us to get 2x 1GBps dedicated links between our 2 DCs. with the former HP core switch we have a configuration like this that allow us to work on Layer 2 only. Sorry if i misspoked.
    – motorbass
    Commented Jun 27, 2023 at 18:45
  • I think I understood that though you were not using LACP before on the HP switches right? If you were using LACP before and it worked, I would expect it to still work as long as the ISP did not change anything. But with the ISP service in the middle, they could change things at any time and you would have no recourse unless the service they sell you is specifically designed to be a direct replacement for dark fiber or similar service. If they promised LACP should work over the 2 links and it doesn't then I would ask them what changed. Do you get any errors on link up/down? Commented Jun 28, 2023 at 0:21

1 Answer 1

0

Even if the QinQ provider apparently did not change anything and the HP switches used to be running fine with LACP, a QinQ service, depending on the platform, might have to be configured to be transparent for L2 protocols like CDP and LACP.

See https://www.cisco.com/c/en/us/td/docs/routers/access/ISRG2/software/feature/guide/QinQ_L2PT.html#wp1054154

from where:

Configuring QinQ and L2PT on L2 Ports SUMMARY STEPS

  1. enable
  2. configure terminal
  3. interface interface-id
  4. switchport mode access or switchport mode dot1q-tunnel
  5. l2protocol-tunnel [cdp | stp | vtp | lacp | pagp | udld]
  6. end
  7. show l2protocol
  8. copy running-config startup-config (Optional)

Be sure to verify that the QinQ provider switches in your lab are tunneling the LACPDUS and not dropping them.

2
  • Huum unfortunately i only have those values available for l2protocol-tunnel allow-double-tag;shutdown-threshold;vtp;cdp-stp;drop-threshold;stp-bridge. i'm giving a try with cdp to then check cdp neighbors
    – motorbass
    Commented Jun 28, 2023 at 12:36
  • 1
    FWIW, the behavior described in Marc's answer is common among many vendors, Juniper also requires additional configuration to tunnel certain L2 protocols (i.e. LACP, LLDP, CDP, etc.) Commented Jun 28, 2023 at 12:43

Not the answer you're looking for? Browse other questions tagged or ask your own question.