4

I have this network setup: enter image description here When I execute arp -d on the laptop to clear the ARP chache and then ping 10.0.0.30 to ping embedded device 1, it should go "I don't know how to get from here to 10.0.0.30. Let me ask out into the network.", or to be more specific, the laptop should broadcast out an ARP packet "Who has 10.0.0.30? Tell 10.0.0.55" from its MAC 08:BE:AC:21:86:36. It should put the response from that into the ARP chache (can be viewed with arp -a) and then perform the actual ping.

Instead, the laptop asks "Who has 10.0.0.35? Tell 10.0.0.55", which is the IP address of embedded device 2, which responds correctly. Windows puts "10.0.0.35 00-24-bd-05-06-23" into the ARP chache and then proceeds to ping quote-unquote-10.0.0.30 but with the MAC of 10.0.0.35: WireShark screenshot The ping request never gets there and, needless to say, the ping fails.

How is this possible? What could cause Windows to use the wrong IP address?

Additional info:

  • All of these addresses are assigned via DHCP.
  • Giving the laptop and/or embedded device 1 static addresses does not change the behavior.
  • My hosts file is empty (except for the default comments)
  • Pinging works fine from other Windows computers in the network to both embedded devices.
  • Pinging from embedded device 1 to the laptop also does not work. The laptop sees the request from the IP and MAC of embedded device 1, but responds to the MAC of embedded device 2.
  • This also happens when using a different network adapter on the laptop (it has a different IP (on the same subnet) and MAC then, but other than that the behavior is the same).
  • When I plugged in a different switch between the laptop, embedded device 1 and the rest of the network, pings succeeded. I quickly swapped back to the first switch, where pings failed again. I then tried a third switch, which also did not work at all. Then I switched back to the second switch where pings succeeded for about a minute, but then it broke without any apparent reason and now it's not working again.
  • When connecting embedded device 1 directly to the laptop and assigning a static 10.0.0.55 IP address, then doing arp -d and ping 10.0.0.30, I still see "Who is 10.0.0.35?" going out multiple times (which is now never answered, which is expected), but eventually also "Who is 10.0.0.30?", which is answered by embedded device 1, "10.0.0.30 00-24-bd-03-a5-25" is added to the ARP chache and the pings succeed.
  • When giving embedded device 1 a static IP 10.0.0.140, pinging 10.0.0.140 instantly works. Pinging 10.0.0.30 still has the same broken behavior of resolving 10.0.0.35.

It seems to be a problem with the laptop. It's entirely possible that rebooting fixes this, but it's also entirely possible that it will randomly break again in a few weeks. I want to diagnose the actual cause now so I have less uncertainty to worry about in the future.
This probably has not happened before. If it has, doing something seemingly unrelated has resolved the issue.

Edit:

Yesterday I sent the laptop to hibernation, woke it back up, tried again and the problem was still the same.
I then sent it to hibernation again, went home and woke it up while connected to my home network, sent it to hibernation again, went to sleep, woke it up again today while connected to the company network, tried again, and now the problem magically disappeared. So unfortunately I can't diagnose any further now, but suggestions for the future when it will inevitably show up again are still appreciated.

Edit 2:

A friend suggested to look at the output of route print:

[...]
IPv4-Routentabelle
===========================================================================
Aktive Routen:
     Netzwerkziel    Netzwerkmaske          Gateway    Schnittstelle Metrik
         10.0.0.0    255.255.255.0   Auf Verbindung         10.0.0.55    281
        10.0.0.30  255.255.255.255        10.0.0.35        10.0.0.55     26
        10.0.0.55  255.255.255.255   Auf Verbindung         10.0.0.55    281
       10.0.0.255  255.255.255.255   Auf Verbindung         10.0.0.55    281
        127.0.0.0        255.0.0.0   Auf Verbindung         127.0.0.1    331
        127.0.0.1  255.255.255.255   Auf Verbindung         127.0.0.1    331
  127.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
     192.168.56.0    255.255.255.0   Auf Verbindung      192.168.56.1    281
     192.168.56.1  255.255.255.255   Auf Verbindung      192.168.56.1    281
   192.168.56.255  255.255.255.255   Auf Verbindung      192.168.56.1    281
        224.0.0.0        240.0.0.0   Auf Verbindung         127.0.0.1    331
        224.0.0.0        240.0.0.0   Auf Verbindung      192.168.56.1    281
        224.0.0.0        240.0.0.0   Auf Verbindung         10.0.0.55    281
  255.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
  255.255.255.255  255.255.255.255   Auf Verbindung      192.168.56.1    281
  255.255.255.255  255.255.255.255   Auf Verbindung         10.0.0.55    281
===========================================================================
Ständige Routen:
  Netzwerkadresse          Netzmaske  Gatewayadresse  Metrik
        10.0.0.30  255.255.255.255        10.0.0.35       1
===========================================================================
[...]

10.0.0.35 (embedded device 2) somehow got specified as a gateway to reach 10.0.0.30 (embedded device 1).

It happened to work as explained in Edit 1 because embedded device 2 was turned off and/or got assigned a different IP address by DHCP. The first ping to 10.0.0.30 timed out (because it couldn't reach 10.0.0.35), after which Windows does something differently that does reach 10.0.0.30 directly. I did not notice that before. Now that I gave embedded device 2 10.0.0.35 again, I can't ping 10.0.0.30 again, as expected.

Note that the spurious route is not specific to the interface. It shows up both on the laptop's integrated ethernet port, and on a USB-to-Ethernet-adapter.

So the question now is: How did 10.0.0.35 manage to become a gateway?

6
  • Looks like a problem in thet network driver, and indeed a reboot would fix this. Honestly, I'd just reboot. You can try turning off the network card and then turning it on again to reinitialize the driver.
    – LPChip
    Commented Mar 18 at 11:15
  • Is it possible that both devices advertise the same MAC address?
    – harrymc
    Commented Mar 18 at 14:53
  • 1
    Little detail: you're not looking at the routing table, ("route print") but at the ARP cache. That's "Level 2" and does not involve routing. Commented Mar 19 at 10:31
  • 1
    Suggestion to maybe get more info... in the future, clear your ARP cache, and try pinging that .35 . See if it does an ARP request for another device. I think nmap runs under Windows, you could also try a network ARP scan and see what happens. Commented Mar 19 at 10:35
  • 1
    It very much does involve routing – the routing table is what decides whether the OS should ARP for the target address directly, or whether it should ARP for a gateway address. If you have a route like "10.0.0.30/32 via 10.0.0.35", what's it going to ARP for and what MAC address is it going to use? It's going to use the gateway's MAC because that's how routing works: it routes L3 destinations via L2 addresses. Commented Mar 19 at 11:59

1 Answer 1

2

As mentioned in the comment, it's the routing table that ultimately decides whether the OS will directly ARP for the destination MAC address or whether it will ARP for a gateway MAC address. (In other words, the routing table maps L3 to L2, and the gateway IP addresses are merely stand-ins for their MAC addresses – they don't actually go in the packet's L3 header, after all.)

So if you see an ARP request being made for the "wrong" IP address, or if you see regular IP packets being sent to a "wrong" MAC address, then the simplest and most likely cause is that you have a route defined via that IP address.

So the question now is: How did 10.0.0.35 manage to become a gateway?

From the output you provided, it seems someone did a route add to define a static route, as it also shows in the 2nd section, "Static Routes", below the main list. (If the entry remains across reboots, then someone did route -p add.) It could have been some kind of software doing it, but it could just as likely have been someone using the laptop – maybe the persistent route was added two years ago and forgotten.

In this case 10.0.0.35 didn't become a gateway; rather, the client host was told to use it as a gateway. The device has no way to unilaterally become a gateway for specific destinations1,2, and in reality it might not be functioning as a gateway/router at all.


1 (In a regular Windows setup, only the DHCP server that you're getting your own address from can do that – it can send a list of 'static' routes as part of the DHCP lease, but they would be treated as dynamic routes and wouldn't show up under "Static Routes" in Windows.)

2 (Windows additionally supports RIPv2 as a dynamic routing protocol, which would technically allow any nearby host to declare itself as a gateway, but 1) it would still require RIPv2 to be enabled on the laptop in the first place, and 2) RIP-received routes would not show up under "Static Routes". They would also disappear after a few minutes if the RIP-speaking gateway is gone.)

1
  • Wow, that's exactly it! Thank you! I had completely forgotten that I added that route myself, like actually years ago. My home network uses the same 10.0.0.* address range as the company network, so during home-office times, I had to add that manual route to connect to10.0.0.30 via the VPN network adapter (that had 10.0.0.35 then, I think). And I guess only just now 10.0.0.35 was handed out to an actual device in the company network.
    – Niko O
    Commented Mar 20 at 10:11

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .