1

I am developing a tunnel application that will provide a low-latency, variable bandwidth link. This will be operating in a system that requires traffic prioritization. However, while traffic towards the tun device is clearly being queued by the kernel, it appears whatever qdisc I apply to the device it has no additional effect, including the default pfifo_fast, i.e. what should be high priority traffic is not being handled separately from normal traffic.

I have made a small test application to demonstrate the problem. It creates two tun devices and has two threads each with a loop passing packets from one interface to the other and back, respectively. Between receiving and sending the loop delays 1us for every byte, roughly emulating an 8Mbps bidirectional link:

void forward_traffic(int src_fd, int dest_fd) {
    char buf[BUFSIZE];
    ssize_t nbytes = 0;
    
    while (nbytes >= 0) {
        nbytes = read(src_fd, buf, sizeof(buf));

        if (nbytes >= 0) {
            usleep(nbytes);
            nbytes = write(dest_fd, buf, nbytes);
        }
    }
    perror("Read/write TUN device");
    exit(EXIT_FAILURE);
}

With each tun interface placed in its own namespace, I can run iperf3 and get about 8Mbps of throughput. The default txqlen reported by ip link is 500 packets and when I run an iperf3 (-P 20) and a ping at the same time I see a RTTs from about 670-770ms, roughly corresponding to 500 x 1500 bytes of queue. Indeed, changing txqlen changes the latency proportionally. So far so good.

With the default pfifo_fast qdisc I would expect a ping with the right ToS mark to skip that normal queue and give me a low latency, e.g ping -Q 0x10 I think should have much lower RTT, but doesn't (I have tried other ToS/DSCP values as well - they all have the same ~700ms RTT. Additionally I have tried various other qdiscs with the same results, e.g. fq_codel doesn't have a significant effect on latency. Regardless of the qdisc, tc -s qdisc always shows a backlog of 0 regardless of whether the link is congested. (But I do see ip -s link show dropped packets under congestion)

Am I fundamentally misunderstanding something here or there something else I need to do make the qdisc effective?

Complete source here

2
  • "traffic towards the tun device is clearly being queued by the kernel," - is this inbound or outbound traffic? Commented Dec 2, 2023 at 21:17
  • @ChrisDavies this is outbound traffic; egressing via the tun interface Commented Dec 2, 2023 at 21:44

1 Answer 1

0

So after some reading and rummaging in the kernel source, it seems the qdisc is ineffective because the tun driver doesn't ever tell the network stack it is busy. It simply holds packets in its own local queues (whose size is set by txqlen) and when they are full it simply drops the excess packets.

Here's the relevant bit of the transmit function in drivers/net/tun.c that is called by the stack when it wants to send a packet:

/* Net device start xmit */
static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
{
    struct tun_struct *tun = netdev_priv(dev);
    int txq = skb->queue_mapping;
    struct tun_file *tfile;
    int len = skb->len;

    rcu_read_lock();
    tfile = rcu_dereference(tun->tfiles[txq]);

....... Various unrelated things omitted .......

    if (ptr_ring_produce(&tfile->tx_ring, skb))
        goto drop;

    /* Notify and wake up reader process */
    if (tfile->flags & TUN_FASYNC)
        kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
    tfile->socket.sk->sk_data_ready(tfile->socket.sk);

    rcu_read_unlock();
    return NETDEV_TX_OK;

    drop:
        this_cpu_inc(tun->pcpu_stats->tx_dropped);
        skb_tx_error(skb);
        kfree_skb(skb);
        rcu_read_unlock();
        return NET_XMIT_DROP;
    }
}

A typical network interface driver should call netif_stop_queue() and netif_wake_queue() functions to stop and start the flow of packets from the network stack. When the flow is stopped, the packets are queued in the attached queue discipline, allowing the user more flexibility in how that traffic is managed and prioritised.

For whatever reason, the tap/tun driver does not do this - presumably because most tunnels simply encapsulate packets and send them to real network interfaces without any additional flow control.

To verify my finding I tried a simple test by stopping the flow control in the function above:

    if (ptr_ring_produce(&tfile->tx_ring, skb)) {
            netif_stop_queue(dev);
            goto drop;
    } else if (ptr_ring_full(&tfile->tx_ring)) {
            netif_stop_queue(dev);
            tun_debug(KERN_NOTICE, tun, "tun_net_xmit stop %lx\n", (size_t)skb);
    }

and a similar additions to tun_ring_recv to stop/wake the queue based on whether it was empty after dequeuing a packet:

    empty = __ptr_ring_empty(&tfile->tx_ring);
    if (empty)
            netif_wake_queue(tun->dev);
    else
            netif_stop_queue(tun->dev);

This is not a great system, and wouldn't work with a multiqueue tunnel, but it works well enough that I could see the qdisc reporting a backlog and a clear difference in ping times and loss rate using pfifo_fast at different ToS levels when the link was at capacity.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .