1

There is an embedded Linux system with an externally connected (ethernet) device which sends lots of UDP data, once it's started by a command. At startup, I have one thread which is to continuously receive UDP data for the remainder of the program runtime. As I see it, for reliability in principle, not just by accident, it must be ensured that the UDP reception loop must make its first call to recv() before the external data source is started, or the first packet or so might me lost, depending on scheduler whims. (this is all in a very local, purposefully simple network setup - packet loss is normally not an issue and not handled - speed is king) Currently, right before calling that UDP reception code, I start a temporary thread which delays for some time, then enables the data source to send UDP data. This is currently the way to "ensure" that the first UDP packet arrives when the reception thread is "armed", i.e. within a recv() call, the OS waiting for data.

Even if I, say, set a condition variable, right before the first call to recv() to tell the rest of the program "ok, you can enable the data source now, I'm ready" - it could, in theory, happen that there is some scheduling induced delay between that signal flagging and the actual call to recv (or/and for the internals of recv actually being ready).

Is there a more elegant / proper way to solve this, than using some "empirical delay time"?

Pseudo code for illustration:

// ******** main thread ********
thread delayed( [&]{ sleepMs(500); enableUdpDataSource(); } );
thread udpRecv( [&]{ udpRecvUntilTimeout() } );
delayed.join();
udpRecv.join();
return 0;

// ******** UDP thread ********
void udpRecvUntilTimeout()
{
  udpInit(); // set up socket, buffer sizes etc

  while (shouldRun)
  {
    // recv() needs to be "armed" *before* the data source is enabled.
    // If I set a condition variable for another thread right here,
    // there may be a scheduling intervention between it and the actual
    // engaging of recv() - when the other thread happily enables the datasource.
    int received = recv( sockFd, buf, maxlen, 0 );
    timeoutWatchdogReset();
    processReceivedData();
  }
}
6
  • 2
    There is no reliability in UDP. You can't make a silk purse out of a sow's ear.
    – user207421
    Commented Jul 30, 2019 at 12:25
  • I'm aware that that's the case in general. But in a system with one integrated switch and 2..3 participants? Other than the kernel being overly busy for some reason - why/were could packets be dropped?
    – sktpin
    Commented Jul 30, 2019 at 13:25
  • 2
    There is nothing you can do to guarantee that the temp thread won't enable the data source before the receiver thread actually is awaiting the first packet. But if the sleep() call works for you, then you might be able to reduce the duration of the sleep by having the receiver thread increment a Semaphore just before it calls recv(), and by having the temp thread; await the semaphore, then sleep, then enable the transmitter. Commented Jul 30, 2019 at 14:34
  • 5
    You want reliability in principal, yet cannot handle packet loss, and believe speed is king. There are contradictions here. If you want your program to be reliable, then handle packet loss not as a failure, but as an expected input to your program. Then you can handle any startup ordering. Almost anything is more elegant than “some empirical delay time” which is also known as a “race condition”.
    – mevets
    Commented Jul 30, 2019 at 14:51
  • "believe speed is king" maybe bad wording: it's a requirement. Also, timing jitter seems pretty low with UDP and certain scheduling settings for the thread. As for contradiction - this is not all one big blob, there are several issues and I'd like to make as many of them as small as possible. But I hear ya. For now it looks like some compromises can't be gotten around. I was told, previously developed systems use UDP in similar vein, still made & sold, & that kind of data connection is expected where it's used. That much I have to take as it is.
    – sktpin
    Commented Jul 30, 2019 at 15:26

2 Answers 2

1

In an earlier version, I suggested that the call to bind is optional, but of course it is not. You have to call it in order to tell the kernel which UDP port to open. After bind, the kernel will buffer incoming UDP packets and you can call recv if you're not interested in the client network details (otherwise, call recvfrom).

Something along these lines:

char buf[1500];
struct sockaddr_in addr;

int sd = socket(AF_INET, SOCK_DGRAM, 0);

memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons((unsigned short) 1234); // UDP port

bind(sd, (struct sockaddr *)&addr, sizeof(addr));

// start data sending thread

sleep(1); // for testing

recv(sd, buf, 100, 0);

But there are no guarantees with UDP; you might still lose packets (eg. if the sender is overloading the receiver)

9
  • You have to bind() AND connect() in order to use recv() for UDP. Otherwise, if you just bind() only, you have to use recvfrom() instead. But either way, you have to bind() first no matter what before you can receive anything, since bind() establishes the local port to receive on. You can use setsockopt(SO_RCVBUF) to increase the buffer size, if needed, to avoid packets being discarded before you can read them. Commented Jul 30, 2019 at 20:27
  • @RemyLebeau The bind is indeed necessary to open the UDP server port (that was not clear and I edited that), but you can still call recv (or read) if you are not interested in client network details. The connect is optional and only for the client; It simply registers the socket and network address with the kernel so that you can call send (or write) instead of sendto.
    – LWimsey
    Commented Jul 31, 2019 at 2:50
  • The advantage for a UDP client calling connect is that if the port is not available, the 'ICMP unreachable' is delivered to the process upon a follow-up system call on the socket (by returning an error). Without connect, the kernel cannot deliver the error because it has no way of knowing that the socket and network address are related.
    – LWimsey
    Commented Jul 31, 2019 at 2:50
  • @LWimsey: UDP doesn't have clients and servers. At the datagram level, there's just a sender and receiver, and data in both directions is treated perfectly symmetrically. You are correct though that recv() can be used on a datagram socket without connection emulation.
    – Ben Voigt
    Commented Jul 31, 2019 at 4:02
  • 1
    "After bind, the kernel will buffer incoming UDP packets" -> Ah! That makes sense. It has to do that at some point, it just wasn't clear to me when and how the buffering happens in the background, so in my hazy view of this I wasn't thinking about that. Excellent!
    – sktpin
    Commented Jul 31, 2019 at 9:10
0

As you're using Linux, it might be possible to use FTRACE to determine whereabouts your receiving thread has got. The function tracing in this allows one (normally in post mortem debugging / analysis) to see the function calls made by a process. I'm pretty sure that this is exposed through some /sys or /proc file system, so it ought to be possible to monitor it live instead.

So if you had your temporary thread looking at the system calls of the receiving thread, it would be able to spot it entering the call to recv().

If FTRACE is not already built into your kernel, you'll need to recompile your kernel to include it. Could be handy - FTRACE + kernelshark is a nice way of debugging your application anyway.

Not the answer you're looking for? Browse other questions tagged or ask your own question.