I have this network setup:
When I execute
arp -d
on the laptop to clear the ARP chache and then ping 10.0.0.30
to ping embedded device 1, it should go "I don't know how to get from here to 10.0.0.30. Let me ask out into the network.", or to be more specific, the laptop should broadcast out an ARP packet "Who has 10.0.0.30? Tell 10.0.0.55" from its MAC 08:BE:AC:21:86:36. It should put the response from that into the ARP chache (can be viewed with arp -a
) and then perform the actual ping.
Instead, the laptop asks "Who has 10.0.0.35? Tell 10.0.0.55", which is the IP address of embedded device 2, which responds correctly. Windows puts "10.0.0.35 00-24-bd-05-06-23" into the ARP chache and then proceeds to ping quote-unquote-10.0.0.30 but with the MAC of 10.0.0.35:
The ping request never gets there and, needless to say, the ping fails.
How is this possible? What could cause Windows to use the wrong IP address?
Additional info:
- All of these addresses are assigned via DHCP.
- Giving the laptop and/or embedded device 1 static addresses does not change the behavior.
- My hosts file is empty (except for the default comments)
- Pinging works fine from other Windows computers in the network to both embedded devices.
- Pinging from embedded device 1 to the laptop also does not work. The laptop sees the request from the IP and MAC of embedded device 1, but responds to the MAC of embedded device 2.
- This also happens when using a different network adapter on the laptop (it has a different IP (on the same subnet) and MAC then, but other than that the behavior is the same).
- When I plugged in a different switch between the laptop, embedded device 1 and the rest of the network, pings succeeded. I quickly swapped back to the first switch, where pings failed again. I then tried a third switch, which also did not work at all. Then I switched back to the second switch where pings succeeded for about a minute, but then it broke without any apparent reason and now it's not working again.
- When connecting embedded device 1 directly to the laptop and assigning a static 10.0.0.55 IP address, then doing
arp -d
andping 10.0.0.30
, I still see "Who is 10.0.0.35?" going out multiple times (which is now never answered, which is expected), but eventually also "Who is 10.0.0.30?", which is answered by embedded device 1, "10.0.0.30 00-24-bd-03-a5-25" is added to the ARP chache and the pings succeed. - When giving embedded device 1 a static IP 10.0.0.140, pinging 10.0.0.140 instantly works. Pinging 10.0.0.30 still has the same broken behavior of resolving 10.0.0.35.
It seems to be a problem with the laptop. It's entirely possible that rebooting fixes this, but it's also entirely possible that it will randomly break again in a few weeks. I want to diagnose the actual cause now so I have less uncertainty to worry about in the future.
This probably has not happened before. If it has, doing something seemingly unrelated has resolved the issue.
Edit:
Yesterday I sent the laptop to hibernation, woke it back up, tried again and the problem was still the same.
I then sent it to hibernation again, went home and woke it up while connected to my home network, sent it to hibernation again, went to sleep, woke it up again today while connected to the company network, tried again, and now the problem magically disappeared. So unfortunately I can't diagnose any further now, but suggestions for the future when it will inevitably show up again are still appreciated.
Edit 2:
A friend suggested to look at the output of route print
:
[...]
IPv4-Routentabelle
===========================================================================
Aktive Routen:
Netzwerkziel Netzwerkmaske Gateway Schnittstelle Metrik
10.0.0.0 255.255.255.0 Auf Verbindung 10.0.0.55 281
10.0.0.30 255.255.255.255 10.0.0.35 10.0.0.55 26
10.0.0.55 255.255.255.255 Auf Verbindung 10.0.0.55 281
10.0.0.255 255.255.255.255 Auf Verbindung 10.0.0.55 281
127.0.0.0 255.0.0.0 Auf Verbindung 127.0.0.1 331
127.0.0.1 255.255.255.255 Auf Verbindung 127.0.0.1 331
127.255.255.255 255.255.255.255 Auf Verbindung 127.0.0.1 331
192.168.56.0 255.255.255.0 Auf Verbindung 192.168.56.1 281
192.168.56.1 255.255.255.255 Auf Verbindung 192.168.56.1 281
192.168.56.255 255.255.255.255 Auf Verbindung 192.168.56.1 281
224.0.0.0 240.0.0.0 Auf Verbindung 127.0.0.1 331
224.0.0.0 240.0.0.0 Auf Verbindung 192.168.56.1 281
224.0.0.0 240.0.0.0 Auf Verbindung 10.0.0.55 281
255.255.255.255 255.255.255.255 Auf Verbindung 127.0.0.1 331
255.255.255.255 255.255.255.255 Auf Verbindung 192.168.56.1 281
255.255.255.255 255.255.255.255 Auf Verbindung 10.0.0.55 281
===========================================================================
Ständige Routen:
Netzwerkadresse Netzmaske Gatewayadresse Metrik
10.0.0.30 255.255.255.255 10.0.0.35 1
===========================================================================
[...]
10.0.0.35 (embedded device 2) somehow got specified as a gateway to reach 10.0.0.30 (embedded device 1).
It happened to work as explained in Edit 1 because embedded device 2 was turned off and/or got assigned a different IP address by DHCP. The first ping to 10.0.0.30 timed out (because it couldn't reach 10.0.0.35), after which Windows does something differently that does reach 10.0.0.30 directly. I did not notice that before. Now that I gave embedded device 2 10.0.0.35 again, I can't ping 10.0.0.30 again, as expected.
Note that the spurious route is not specific to the interface. It shows up both on the laptop's integrated ethernet port, and on a USB-to-Ethernet-adapter.
So the question now is: How did 10.0.0.35 manage to become a gateway?