1

On our VPS we face connection issues with IPv6, hopefully someone can help to debug the issue.

Pings fail at first and succeed later:

2020-06-01 23:20:55 <user>@<host>:~# ping -6 google.com
PING google.com(ams15s30-in-x0e.1e100.net (2a00:1450:400e:807::200e)) 56 data bytes
From <host>.com (<ip>) icmp_seq=1 Destination unreachable: Address unreachable
...
From <host>.com (<ip>) icmp_seq=6 Destination unreachable: Address unreachable
64 bytes from ams15s30-in-x0e.1e100.net (2a00:1450:400e:807::200e): icmp_seq=7 ttl=54 time=14.0 ms
...
64 bytes from ams15s30-in-x0e.1e100.net (2a00:1450:400e:807::200e): icmp_seq=13 ttl=54 time=12.1 ms
--- google.com ping statistics ---
13 packets transmitted, 7 received, +6 errors, 46% packet loss, time 12174ms
rtt min/avg/max/mdev = 12.151/12.683/14.069/0.767 ms

As can be seen the DNS resolving succeeds immediately, that is not the problem. The first outgoing pings throw an error message, from the 7th on it succeeds. How long it takes before the first ping succeeds varies.

curl switches to IPv4 immediately:

2020-06-01 23:21:16 <user>@<host>:~# curl -vIL google.com
* Rebuilt URL to: google.com/
*   Trying 2a00:1450:400e:807::200e...
* TCP_NODELAY set
*   Trying 172.217.17.142...
* TCP_NODELAY set
* Connected to google.com (172.217.17.142) port 80 (#0)
...

wget tries a bid longer to connect, and, sometimes succeeds, sometimes fails and switches to IPv4 as well:

2020-06-02 00:49:11 <user>@<host>:~# wget --spider google.com
Spider mode enabled. Check if remote file exists.
--2020-06-02 00:51:01--  http://google.com/
Resolving google.com (google.com)... 2a00:1450:400e:807::200e, 172.217.17.142
Connecting to google.com (google.com)|2a00:1450:400e:807::200e|:80... failed: No route to host.
Connecting to google.com (google.com)|172.217.17.142|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.google.com/ [following]
Spider mode enabled. Check if remote file exists.
--2020-06-02 00:51:20--  http://www.google.com/
Resolving www.google.com (www.google.com)... 2a00:1450:400e:804::2004, 172.217.17.36
Connecting to www.google.com (www.google.com)|2a00:1450:400e:804::2004|:80... failed: No route to host.
Connecting to www.google.com (www.google.com)|172.217.17.36|:80... connected.
HTTP request sent, awaiting response... 200 OK

This happens btw regardless of host/IP. Default route is there, the interface has a link-local address and a global IPv6 address, assigned via DHCPv6:

2020-06-02 00:58:25 <user>@<host>:~# ip -6 r
::1 dev lo proto kernel metric 256 pref medium
::/64 dev eth0 proto kernel metric 256 expires 2590394sec pref medium
<ipv6> dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via <gateway> dev eth0 proto ra metric 1024 expires 194sec pref medium

2020-06-02 00:58:56 <user>@<host>:~# ip -6 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 <ipv6>/128 scope global
       valid_lft forever preferred_lft forever
    inet6 <LLA>/64 scope link
       valid_lft forever preferred_lft forever

IPv4 connections always succeed immediately.

rdisc6 output:

2020-06-02 13:10:36 <user>@<host>:~# rdisc6 eth0
Soliciting ff02::2 (ff02::2) on eth0...

Hop limit                 :    undefined (      0x00)
Stateful address conf.    :          Yes
Stateful other conf.      :           No
Mobile home agent         :           No
Router preference         :       medium
Neighbor discovery proxy  :           No
Router lifetime           :         1800 (0x00000708) seconds
Reachable time            :  unspecified (0x00000000)
Retransmit time           :  unspecified (0x00000000)
 Source link-layer address: <MAC>
 Prefix                   : ::/64
  On-link                 :          Yes
  Autonomous address conf.:           No
  Valid time              :      2592000 (0x00278d00) seconds
  Pref. time              :       604800 (0x00093a80) seconds
 from fe80::<ipv6>

traceroute6 (this fails sometimes with 30 empty lines):

2020-06-02 13:14:18 <user>@<host>:~# traceroute6 google.com
traceroute to google.com (2a00:1450:400e:807::200e) from <ipv6>::142, port 33434, from port 54573, 30 hops max, 60 bytes packets
 1  * * <ipv6>::1 (<ipv6>::1)  2055.792 ms
 2  * 2a06:7f80::1 (2a06:7f80::1)  2055.700 ms  1.262 ms
 3  ipv6.decix-dusseldorf.core1.dus1.he.net (2001:7f8:9e::1b1b:0:1)  2058.316 ms  2.655 ms  2.810 ms
 4  100ge5-2.core1.ams1.he.net (2001:470:0:371::1)  4.658 ms  3.804 ms  3.865 ms
 5  de-cix.fra.google.com (2001:7f8::3b41:0:1)  4.731 ms  12.465 ms  9.900 ms
 6  2001:4860:0:11e1::e (2001:4860:0:11e1::e)  14.691 ms  10.691 ms  10.654 ms
 7  2001:4860:0:1::1c7f (2001:4860:0:1::1c7f)  12.320 ms  11.433 ms  11.476 ms
 8  2001:4860::c:4000:d9a9 (2001:4860::c:4000:d9a9)  15.681 ms  16.138 ms  14.906 ms
 9  ams15s30-in-x0e.1e100.net (2a00:1450:400e:807::200e)  15.327 ms  12.979 ms  12.162 ms

ip monitor/ip mon route show that the default route seems to be not reliably reachable and is deleted regularly after being expired, and not always recreated shortly after. These are the outputs of a few hours:

fe80::<ipv6_1> dev eth0 lladdr <mac_1> PROBE
fe80::<ipv6_1> dev eth0 lladdr <mac_1> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_1> dev eth0 lladdr <mac_1> STALE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router REACHABLE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_4> dev eth0 lladdr <mac_4> PROBE
fe80::<ipv6_4> dev eth0 lladdr <mac_4> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_4> dev eth0 lladdr <mac_4> STALE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
Deleted default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 expires -4sec pref medium
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
Deleted default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 expires -11sec pref medium
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
prefix ::/64dev eth0 onlink valid 2592000 preferred 604800
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router REACHABLE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router REACHABLE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router REACHABLE
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router STALE
Deleted default via fe80::<ipv6_2> dev eth0 proto ra metric 1024 expires -3sec pref medium
fe80::<ipv6_2> dev eth0 lladdr <mac_2> router PROBE
fe80::<ipv6_2> dev eth0  router FAILED
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE
<ipv4_1> dev eth0 lladdr <mac_1> PROBE
<ipv4_1> dev eth0 lladdr <mac_1> REACHABLE
<ipv4_1> dev eth0 lladdr <mac_1> STALE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> PROBE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> REACHABLE
fe80::<ipv6_3> dev eth0 lladdr <mac_3> STALE

Narrowing down the issue

The following shows that the router is not always sending router advertisements regularly enough so that the default gateway entry expires after 1800 seconds, note the timestamp of the last PS1 prompt when interrupting tcpdump:

2020-06-03 12:26:31 <user>@<host>:/var/log# tcpdump -n -i eth0 icmp6 and ip6[40] == 134
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:45:41.290680 IP6 fe80::XXX > ff02::1: ICMP6, router advertisement, length 56
14:11:10.133781 IP6 fe80::XXX > ff02::1: ICMP6, router advertisement, length 56
^C
2 packets captured
5 packets received by filter
0 packets dropped by kernel
2020-06-03 14:58:07 <user>@<host>:/var/log#

While the first two RAs were close enough to keep the default route (although already 4 minutes before expiry), the 3rd RA is missing too long, hence the default route was lost, hence no IPv6 connections are possible anymore.

Meanwhile I can see lots of neighbor solicitation from the router, hence its ICMPv6 requests do arrive.

2020-06-03 14:56:03 <user>@<host>:/var/log# tcpdump icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:03:07.750318 IP6 fe80::XXX > ff02::YYY: ICMP6, neighbor solicitation, who has 2a06:ZZZ, length 32
15:03:08.356100 IP6 fe80::XXX > ff02::YYY: ICMP6, neighbor solicitation, who has 2a06:ZZZ, length 32

But no RAs arrive, not even when trying to force them, currently:

2020-06-03 15:03:21 <user>@<host>:/var/log# rdisc6 eth0
Soliciting ff02::2 (ff02::2) on eth0...
Timed out.
Timed out.
Timed out.
No response.

This fits to the above ip monitor output where probing the router often simply fails. However since I see the NDs from the router, I guess it could answer me but for some reason does not respectively ignores my NDs?

I am able to manually restore the default route permanently via:

ip -6 r add default dev eth0 via fe80::<ipv6>

While IPv6 connections are again possible with this, they usually still have a long delay or time out completely.

3
  • 1
    Is the default route always there? (Use ip monitor or ip mon route to check.) Can you show the output of rdisc6 eth0? Commented Jun 2, 2020 at 5:36
  • I added rdisk6, tranceroute6 and ip monitor outputs to the question. Both monitor commands you suggested are still running, the second without output so far, I'll add any further outputs I get. Indeed the route state does not seem to be stable. Traceroute shows a long delay on the first hop, so probably a network issue of the VPS provider?
    – MichaIng
    Commented Jun 2, 2020 at 12:29
  • Now indeed the default route has been deleted, obviously as it expired. No new route has been added until now, IPv6 connections fail consequently immediately with connect: Network is unreachable. Might be the reason that DHCPv4 and DHCPv6 conflict with each other, as two instances of dhclient are running?
    – MichaIng
    Commented Jun 2, 2020 at 13:45

1 Answer 1

1

Note 1: You're only using DHCPv6 to obtain an address – it is not used for the default route. That's still done via SLAAC, i.e. ICMPv6 "Router Advertisement" packets.

Note 2: ip monitor shows several different kinds of events intermixed: addresses, routes, and neighbor cache entries. You can run ip mon route, ip mon neigh to see them separately.

I would guess that there is a problem in between your VPS and your nearest gateway, because:

  1. The neighbour entry for your default gateway (the IPv6 equivalent of ARP cache entry) does not successfully go into REACHABLE state – it keeps going into FAILED state, meaning your host sent several ND requests (the equivalent of ARP queries) to renew the cache entry but didn't receive any response.

    Neighbor discovery, just like ARP for IPv4, is the absolute bare minimum for a functioning IPv6 network.

  2. Expiry for the default route ::/0 is reset according to "Router lifetime" every time a SLAAC advertisement is received. In your case, the advertised lifetime is 1800 seconds, so the router should repeat the advertisement at least every 900 seconds so the default route never goes below half its lifetime.

    But as you can see from ip -6 route output, your ::/0 route was only 194 seconds from expiry. This either means the router's timers are misconfigured, or its broadcast RAs are just not reaching you for whatever reason – as a result, you keep losing the default route.

There's one thing common to both above issues: ND and SLAAC are both using ICMPv6 multicasts, so very carefully check whether your firewall isn't imposing strict rate limits on incoming Router Advertisements or Neighbor Adverts, or on multicast packets in general.

(You can use tcpdump to check whether you're receiving packets; e.g. if a RA shows up in tcpdump but fails to renew the default route then it may be your firewall's problem.)

5
  • Many thanks for your answer, that narrows the issue and explains a few details as well. tcpdump 'icmp6' shows a large number of "neighbor solicitation, who has" requests from the default router (at the the one that should be the default IPv6 route, although it is not set currently...) to various IPv6/hosts with similar IP then our VPS. That is ND (neighbour discovery) right? How do I adjust the tcpdump filter to only show RA?Furthermore I found net.ipv6.icmp.ratelimit = 1000, will try to set it to 0. Checked net.ipv6.conf.*.accept_ra = 1 is correct, no forwarding enabled.
    – MichaIng
    Commented Jun 2, 2020 at 20:25
  • ICMPv6 rate limit was not the issue, the default route still expired by times. No firewall is in place btw. I'm currently running tcpdump -n -i eth0 icmp6 and ip6[40] == 134 to check how often RAs are arrived, found here: unix.stackexchange.com/a/312613
    – MichaIng
    Commented Jun 3, 2020 at 10:26
  • rdisc6 eth0 fails currently, so I cannot even force a RA reliably. Manually adding the router as default route works and restores IPv6 connectivity and the manually added route does not expire. However connection is still unreliable, has long delay etc. Strange is that I meanwhile see many ICMPv6 NDs from the router, so probably it is simply overloaded or its rate limit is reached?
    – MichaIng
    Commented Jun 3, 2020 at 17:33
  • You might be seeing Neighbor Solicitations meant for other hosts in the same subnet. They're essentially broadcast like ARP. Commented Jun 3, 2020 at 17:40
  • Yes indeed those are all meant for other hosts, but it shows that my host is able to receive requests/NDs/ICMPv6 from the router hence the issue is the router being overloaded or ignoring my RA requests or denying to send them out for another reason, right? With the default route added manually, I see ICMPv6: RA: ndisc_router_discovery failed to add default route with timestamps showing RAs sometimes after 20 mins, sometimes after >1.5 hours, hence very irregular.
    – MichaIng
    Commented Jun 3, 2020 at 18:19

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .