1

Sorry if my issue is a bit hard to summarize in the title. That is the best I could come up with.

TL;DR Version: How do I debug when packets reach OS, but not the destination process?

Explanation: I have two processes, running on two devices, and communicating over TCP/IP. The first device is only connected to the second device, directly via an Ethernet cable. The second device is connected to the network. The two devices connect and start communicating with each other without any problem. I then try to physically disconnect the first device, and then reconnect the cable after a few moments. Using Wireshark I see that the device receives the packets. I see that packets have the right destination port number. I see that my process is listening on [0.0.0.0:port] with correct port number. But for some reason the process is not receiving the packets.

Here is the weird thing though. This only happens when the first device is directly connected to the second device. If I connect both devices to a switch and repeat this test, the packets reach the process after reconnecting the cable without an issue.

In both scenarios I am statically setting the IPs. And the process in question is using the ZMQ stack to receive packets. What confuses me most is why would my network topology affect the routing that is happening within the Linux operating system (if that is the case).

How do I debug this scenario? Where should I start looking? Is there a test I could run to narrow things down to where the issue is? Please let me know if you would want me to clarify anything further.

P.S. I have firewalls disabled on both systems.

3
  • Are the packets using UDP, TCP, or something else? Is the IP configuration (the interface, the IP address, the subnet prefix) completely identical in both cases? Commented May 22, 2020 at 6:55
  • As I said they are over TCP/IP. No they are not identical. Thanks for the suggestion I will try with identical IP setup.
    – pooya13
    Commented May 22, 2020 at 7:06
  • That would be the first step. Don't try to troubleshoot a system that's different in 10 ways if you're able to minimize the differences to just one thing (such as the physical connection). Commented May 22, 2020 at 7:08

2 Answers 2

2

This is a lot of guessing, but there can be a couple of reasons that I've seen on actual systems so far: (I would almost bet it's the checksum.)

  • The firewall (netfilter). Check nft list ruleset as well as iptables-save -c.
  • Bad checksum, either of the IP header or of the TCP/UDP packet. (Hardware checksum offload has several times been a cause of mysteriously ignored UDP packets on VMs.) Check netstat -s for statistics on the receiving side, look for "InCsumErrors:".
  • Strict rp_filter discarding the packet because the reverse route would point through a different interface. Run sysctl net.ipv4.conf.all.rp_filter=0 to disable this.
  • The specific {dstaddr:dstport} matches another listening socket bound to the specific IP address, which takes priority over a wildcard {0.0.0.0:dstport} socket.
  • The specific combination of {srcaddr:srcport → dstaddr:dstport} matches a connected socket and is therefore not delivered to any listening socket. This can apply even to UDP – although there are no connections on the wire, you can still have a 'connected' UDP socket that's associated with a specific remote endpoint.
  • (IPv6-specific:) On the destination system, the assigned IP address is inactive because of DAD (duplicate address detection) failure. It would show up as 'dadfailed' in ip -6 addr.
1
  • Thank you. Specifically point 5 seems to be the issue. After reconnecting I see that there are two established TCP connections, one from before and one new one (the client having received a different port number). I am still not sure why this is happening or why it happens with A <----> B <-----> Router topology (where I disconnect A) but not with A <----> B topology.
    – pooya13
    Commented May 22, 2020 at 20:13
2

You can also try dropwatch to see if the kernel drops the packet for some reason.

2
  • Thanks. If a packet is dropped by the Kernel, is it still possible to be picked up by Wireshark?
    – pooya13
    Commented May 22, 2020 at 18:30
  • Wireshark captures incoming packets before the kernel handles it (and possibly drops it). So yes, Wireshark will see the packets, the kernel may drop them, and then the application won't see them; which is the situation you describe unless I misunderstood your description.
    – dirkt
    Commented May 22, 2020 at 18:42

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .