6

I am experiencing a problem with lost routes in an OSPF over DMVPN environment. I am running ubuntu server 14.04 with Quagga 0.99.22.4, and OpenNHRP 0.14.1.

While technically the DMVPN network is an NBMA network, the NHRP protocol does a simulated broadcast (a capture of the multicast traffic, and a duplicated unicast to other known/configured nodes). This allows the DMVPN subnet to be configured as a broadcast network in OSPF. This configuration is a requirement to do dynamic spoke-to-spoke tunnels, which, in turn, is a requirement of my project.

I'm not sure if the problem I'm facing is an OSPF design flaw, or a quagga bug, but I haven't been able to come up with the right search phrase combinations to find someone having a similar problem.

On this DMVPN network, I have three routers: A hub node in each of two data centers, and a spoke node at a branch (there will be many, many more branches as I migrate them to the DMVPN). Each node has several routable subnets behind it, and the subnets behind each data center node are routable to eachother via other paths. The design allows for either hub node to go down, and OSPF re-converges to allow full communication throughout my AS. This all works flawlessly.

The problem comes into play when there is partial connectivity over my DMVPN. If there is a communication problem from the spoke to the primary hub (which is the DR), then the spoke drops that hub from the neighbor table, and presumes the secondary hub as the new DR. In this particular case however, the secondary hub has no issue communicating with both the spoke and the primary hub, and therefore still happily considers itself the BDR, and the primary hub the DR.

In simulating this issue with a couple of iptables rules to drop traffic to/from the primary hub, I can watch the dead timer expire for the primary hub within the spoke, and at that instant quagga removes all ospf routes from the spoke's routing table.

From the spoke, I can still see a neighbor relationship with Full adjacency to the secondary hub, and the spoke assumes that this hub is now the DR (please note, I've manually changed router-ids, hostnames, and addresses):

spoke-router# sh ip ospf neigh

Neighbor ID     Pri State             Dead Time Address       Interface            RXmtL RqstL DBsmL
1.1.1.2         254 Full/DR           35.184s   10.10.10.2    gre1:10.10.10.3          0     0     0

If I look at the secondary hub, which can still communicate with both other nodes, it tells a different story.

secondary-hub# sh ip ospf neigh gre1

Neighbor ID Pri State           Dead Time Address         Interface            RXmtL RqstL DBsmL
1.1.1.1     255 Full/DR         36.513s   10.10.10.1      gre1:10.10.10.2          0     0     0
1.1.1.3       0 Full/DROther    35.387s   10.10.10.3      gre1:10.10.10.2          0     0     0

You can probably anticipate the neighbor table of the primary hub:

primary-hub# sh ip ospf neigh gre1

Neighbor ID Pri State           Dead Time Address         Interface            RXmtL RqstL DBsmL
1.1.1.2     254 Full/Backup     37.067s   10.10.10.2      gre1:10.10.10.1          0     0     0

According to tcpdump, I am still sending and recieving hello packets between the spoke and the secondary hub. After the communication breakdown however the hello packets contain some interesting data.

Naturally, the DR and BDR addresses are mismatched, and I can only assume that this is why the spoke router removes all ospf learned routes. The spoke router's hello packets list the DR and BDR as 10.10.10.2 (My somewhat limited knowledge of OSPF would lead me to assume that from it's perspective the BDR should be set to 0.0.0.0). The secondary hub's hello packets list 10.10.10.1 as the DR, and 0.0.0.0 as the BDR (Again, from it's perspective I would assume that the BDR would be listed as itself: 10.10.10.2. This is the case before the communication failure).

So, you can see where this is a bit frustrating since the spoke router still has a perfectly viable path through the secondary hub to the entire AS, but OSPF completely drops all of the known routes to get there.

Can anyone tell me if this is a shortcoming of OSPF or a quagga bug? Is there a configuration setting that would allow me to ignore mismatched DR/BDR in the hello packets? Would that be more dangerous than its worth?

I know that I can work around the issue by doing a dual cloud DMVPN with only one DR at each data center, but that requires a bit more complexity, management overhead, and performance overhead. I'd really prefer to stick with a single cloud dual hub deployment if at all possible.

Please let me know if you need me to provide any more information! (I tried to create DMVPN as a tag, but I don't have enough reputation here yet.)

Quick Note:

Issuing a show ip ospf database on the spoke router still shows the full link state database of the entire AS, yet quagga still does not commit the routes to the kernel routing table.

More Info:

tcpdump of OSPF Hello packets from the perspective of the spoke router (manually changed ip addresses and router ids, copy/pasted individually so ignore timestamps):

21:13:43.643160 IP (tos 0xc0, ttl 1, id 25083, offset 0, flags [none], proto OSPF (89), length 68)
10.10.10.3 > 224.0.0.5: OSPFv2, Hello, length 48
Router-ID 1.1.1.3, Backbone Area, Authentication Type: none (0)
Options [External]
  Hello Timer 10s, Dead Timer 40s, Mask 255.255.255.0, Priority 0
  Designated Router 10.10.10.2, Backup Designated Router 10.10.10.2
  Neighbor List:
    1.1.1.2
21:13:37.736761 IP (tos 0xc0, ttl 1, id 32711, offset 0, flags [none], proto OSPF (89), length 72)
10.10.10.2 > 224.0.0.5: OSPFv2, Hello, length 52
Router-ID 1.1.1.2, Backbone Area, Authentication Type: none (0)
Options [External]
  Hello Timer 10s, Dead Timer 40s, Mask 255.255.255.0, Priority 254
  Designated Router 10.10.10.1
  Neighbor List:
    1.1.1.1
    1.1.1.3

Another Edit:

I was able to bring a Cisco 2801 in as another spoke in my DMVPN topology described above. Simulating the partial connectivity results in the exact same behavior on the Cisco spoke, as what I experience on the quagga spoke. Right down to the content of the hello packets between the Cisco and the secondary hub.

Yes, more editing:

After seeing the content of the secondary hub's hello packets change, even though it's connectivity perspective hadn't changed, I wanted to try a different OSPF implementation. Specifically, the secondary hub's hello packets have listed the primary hub as the DR, and nothing as the BDR during the simulated communication failure. Even though it is quite happily still acting as a BDR. My thought was that perhaps a different OSPF implementation would still send the proper information in it's hello packets. I attempted to use "bird" as suggested in one of the answers, however it would not allow me to override the NBMA network type to broadcast, therefore it would not participate in my OSPF environment. So the next test was to configure a Cisco router as the secondary hub. I went through the process of setting it all up and then simulated the communication failure.

As I suspected, the Cisco router still advertised it's hello packets correctly from it's perspective: The primary hub as the DR and the secondary hub as the BDR. I was hopeful that having at least one of the two matching properly in the hello packets from the spoke (both set to the secondary hub, remember) would be enough to have the spoke router keep the routes installed to the kernel to allow traffic to flow through the secondary hub to the rest of the network. Unfortunately, this was not the case.

More details, as requested:

              xxxxxxx xxxx xx                   xxxxxxx xxxx xx
            xx              xxxx              xx              xxxx
            xx                 xxx            xx                 xxx
              x   Other Subnets  +--------------+   Other Subnets  x
               xx               xx               xx               xx
                 xxxxxxx+xxxxxxxx                 xxxxxxx+xxxxxxxx
                        |                                |
               +--------+--------+               +-------+----------+
               |                 |               |                  |
               |  Primary Hub    |               |  Secondary Hub   |
               |  id: 1.1.1.1    |               |  id: 1.1.1.2     |
               |                 |               |                  |
               +------+----------+               +-----------+------+
                      | .1                               .2  |
                      |        xxxxx xx                      |
                      |       xxxxxxx xxxx xx                |
                      |     xx              xxxx             |
                      |     xx                 xxx           |
                      +------+x    DMVPN Cloud   +-----------+
                              xxx  10.10.10.0/24 x
                                xx               xx
                                 xxxxxxx++xxxxxxxx
                                        |
                                        |
                                        | .3
                             +----------+-----------+
                             |                      |
                             |                      |
                             |     Spoke            |
                             |     id: 1.1.1.3      |
                             |                      |
                             |                      |
                             +----------------------+

All interfaces connecting to the DMVPN subnet are "gre1". Yes, the diagram really is that simple when looking at this at the GRE subnet layer. There are routable stub networks that live behind the spoke as well. I can see where the design and different implementations of OSPF potentially never accounted for a node-to-node failure over a single broadcast domain. Physically, in a local layer 2 network it would be highly unlikely to have that type of failure... possible but not likely. But because in this case the broadcast domain is virtual, and extends over the internet, the type of failure scenario described above is much more likely (I have seen it twice in the last year or so). Remember, from OSPF's perspective, the DMVPN is simply a normal layer 2 broadcast domain to use as a transit network.

There are two primary purposes of a DMVPN. One is to ease the configuration and maintenance burden on the hub sites. You can actually run a branch on a dynamic IP if required or desired, and the hubs only need one configuration... not one for each branch. Second is that the technology allows for the dynamic establishment of direct, and encrypted spoke to spoke connectivity over the internet (even with a branch on a dynamic IP). Essentially a full mesh VPN, without the nightmare of a full mesh configuration. Both of these purposes are important to this project because as I grow this network from one spoke to 80, I won't have as high an administrative burden. And, when one branch calls the other over VoIP, the media packets are taking the shortest possible path, without having to hair-pin through a hub site.

In order for OSPF to correctly establish the direct routes, the GRE subnet must act as a virtual broadcast domain. What allows this to work over the internet is the NHRP protocol, which for the sake of argument can be considered like a layer 3 to layer 3 arp.

As Requested, here are the relevant OSPF configs:

Spoke:

router ospf
 ospf router-id 1.1.1.3
 auto-cost reference-bandwidth 1024
 passive-interface default
 no passive-interface gre1
 network <branch-subnet>/24 area 0.0.0.0
 network 10.10.10.0/24 area 0.0.0.0

Primary Hub:

router ospf
 ospf router-id 1.1.1.1
 auto-cost reference-bandwidth 1024
 passive-interface default
 no passive-interface eth0
 no passive-interface gre1
 network 10.10.10.0/24 area 0.0.0.0
 network <primary-transit-supernet>/26 area 0.0.0.0

Secondary Hub:

router ospf
 ospf router-id 1.1.1.2
 auto-cost reference-bandwidth 1024
 passive-interface default
 no passive-interface eth0
 no passive-interface gre1
 network 10.10.10.0/24 area 0.0.0.0
 network <secondary-transit-supernet>/26 area 0.0.0.0
2
  • Please consider adding more details to the question. A diagram showing how the DR, BDR, and spoke are connected / addressed including interface names would be quite helpful. Also explain why it's so important to use an ospf broadcast network Commented Nov 27, 2014 at 9:47
  • I've added the requested information. Commented Dec 2, 2014 at 20:12

3 Answers 3

2

To answer your question directly, this is expected behavior by OSPF and not a bug in Quagga.

So first, let's take a look at the DR/BDR section of the RFC.

Designated Router
    The Designated Router selected for the attached network.  The
    Designated Router is selected on all broadcast and NBMA networks
    by the Hello Protocol.  Two pieces of identification are kept
    for the Designated Router: its Router ID and its IP interface
    address on the network.  The Designated Router advertises link
    state for the network; this network-LSA is labelled with the
    Designated Router's IP address.  The Designated Router is
    initialized to 0.0.0.0, which indicates the lack of a Designated
    Router.
Backup Designated Router
    The Backup Designated Router is also selected on all broadcast
    and NBMA networks by the Hello Protocol.  All routers on the
    attached network become adjacent to both the Designated Router
    and the Backup Designated Router.  The Backup Designated Router
    becomes Designated Router when the current Designated Router
    fails.  The Backup Designated Router is initialized to 0.0.0.0,
    indicating the lack of a Backup Designated Router.

If the BDR field in the Hello packet header is set to 0.0.0.0, it means you do not have a BDR elected.

In your case this is because you have your other router set to a priority of 0, this makes the router ineligible to become a BDR (this is why you see "DROther" and not "BDR"). You just need to set the priority on your other router to something that isn't 0.

Here is the other piece from the RFC for some more context.

Router Priority
    An 8-bit unsigned integer.  When two routers attached to a
    network both attempt to become Designated Router, the one with
    the highest Router Priority takes precedence.  A router whose
    Router Priority is set to 0 is ineligible to become Designated
    Router on the attached network.  Advertised in Hello packets
    sent out this interface.

https://www.ietf.org/rfc/rfc2328.txt

6
  • First, thank you for your answer. I appreciate your taking the time to look into this issue for me. I intentionally have the spoke router set to a priority of 0 as the "spoke" in a DMVPN is typically a small branch office with low powered hardware and a slow link to the internet. From what I understand, this is a best practice when running OSPF over a DMVPN. And it makes sense that you wouldn't want a branch router to be the DR or BDR for an 80 router transit net. Commented Nov 22, 2014 at 1:39
  • I am completely familiar with the purpose and definition of the DR, BDR, and priority. I also kind of assumed that the 0.0.0.0 addresses were a type of initialization parameter. I simply highlighted those to show the differences in each router's hello packets. The issue that I'm facing is in a very isolated and rare condition. I don't believe that upping the priority of the spoke router would do any good either as the secondary hub still believes it's the BDR, and would likely not participate in an election for a new BDR. Commented Nov 22, 2014 at 1:49
  • 1
    Okay, no worries. I re-read everything and that makes sense - can we see your OSPF and interface configurations? Even better if you have a couple of examples from the tcpdump. Commented Nov 22, 2014 at 1:52
  • Added some more info to the question. Thanks again. Commented Nov 22, 2014 at 2:51
  • The only other thing we could possibly check would be your OSPF and interface configurations if that's possible? Commented Nov 25, 2014 at 18:19
2

I was asked to provide an answer to this question, to close it out. I may be a little fuzzy on the details as we are approaching 3 years ago now.

Essentially we worked around this issue. We wound up adding the required code changes to bird (thanks for the suggestion @drookie), and switching over to use it. We like the fact that it's a more *nix style service with a config file (makes config management easier), rather than a cisco emulator.

I can't remember if we ever tested this exact failure scenario with bird, but I'm guessing we did, and it failed in a similar fashion because we ended up creating a dual layer dmvpn.

Each virtual broadcast domain (DMVPN cloud) contains 2 hub boxes at each datacenter (rather than one hub as the diagram above suggests), with each of the hub boxes being on separate internet subnets to protect against the one off internet routing issues.

The management burden I was originally concerned with was subsided because the project requirements changed and supporting cisco based spokes was no longer required, while at the same time we deployed an organization wide config management infrastructure.

I wish I could provide a more direct answer to the question, but this is how we pushed through this issue. Thank you to the folks who attempted to help.

0

Quagga losing routes was my primary reason to switch to bird. Over some time I am running a corporate VPN with hundreds of branch-offices, linked through the ipsec/gre/whatever point-to-point tunnels. On some branch offices I was using the FreeBSD/quagga setups. The issue was - in a OSPF router chain like A - B - C the prefix originated from A was seen on B, but not on C. After months of struggling, finding no solution, and seeing this situation spontaneously arises and clears it self with quagga (and with no proprietary vendors like Cisco or Juniper) and various prefixes, I switched to bird. Now the problem is gone.

You may call this lame. You may say I should report this and help communnity to fix it. But for me quagga and it's bugtracker seems to be for a long time some kind of post-nuclear desert, where sporadic groups of friendly developers may be encountered, but more likely you will die of dehydration or some bug. Bird is developed more intensively, and it's mailing list is more populated with developrs and practical help.

9
  • Thanks for your answer. I have not yet heard of bird, and always appreciate knowing software alternatives. I've been running quagga throughout the rest of the network without issue, and would prefer to keep things consistent. Commented Nov 24, 2014 at 13:47
  • I was running quagga for 12 years.
    – drookie
    Commented Nov 24, 2014 at 13:54
  • I decided to give bird a try on my secondary hub box. Unfortunately, bird was forcing my gre interface into "network-type nbma" even though I explicitly set it for "broadcast". Without a broadcast network type, I cannot accomplish the goal of the project. Thanks again for the input though. Commented Nov 24, 2014 at 22:00
  • well, the thing is, gre cannot be a broadcast interface due to it's nature.
    – drookie
    Commented Nov 25, 2014 at 8:18
  • Understood, however in the case of DMVPN, the OpenNHRP daemon captures the multicast traffic, copies it, and re-issues it as multiple unicast packets to known hosts. Effectively a simulated broadcast, but since bird doesn't allow me to force the network type, I can't use it. Commented Nov 25, 2014 at 14:46

Not the answer you're looking for? Browse other questions tagged or ask your own question.