0

I replaced my consumer wireless router with a linux box that has a quad-gigabit NIC PCIe card and a single gigabit NIC on the motherboard (for the WAN). After turning on IP forwarding, masquerading (via iptables), and setting up subnets on each of the four LAN interfaces I ran some speed tests.

$ ip route
default dev ppp0 scope link 
10.0.0.0/16 dev enp3s0f0 proto kernel scope link src 10.0.0.1 
10.64.0.0/16 dev enp3s0f1 proto kernel scope link src 10.64.0.1 
10.192.0.0/16 dev enp4s0f1 proto kernel scope link src 10.192.0.1 
aaa.bbb.ccc.ddd dev ppp0 proto kernel scope link src www.xxx.yyy.zzz 
  • From a wireless device on one of the LAN subnets to a speedtest server on the WAN I get the full 40 Mbps / 5 Mbps I pay my ISP for.

  • From the router host to a wired LAN host using iperf3 I can consistently maintain 930+ Mbps for several minutes.

  • From a wired device on one of the LAN subnets to a wired device on a different LAN subnet using iperf3 I initially get 80-95 Mbps for the first few seconds but it rapidly drops to zero.

  • From a wired device on one of the LAN subnets to a wired device on a different LAN subnet using iperf3 with a target bitrate of 20 Mbps I see the similar results (see update at end), but can sustain about 10 Mpbs

.

Connecting to host 10.0.0.2, port 5201
[  5] local 10.192.128.3 port 35620 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec    0   73.5 KBytes       
[  5]   1.00-2.00   sec  9.01 MBytes  75.6 Mbits/sec    0   82.0 KBytes       
[  5]   2.00-3.00   sec  8.26 MBytes  69.3 Mbits/sec    0   79.2 KBytes       
[  5]   3.00-4.00   sec  9.01 MBytes  75.6 Mbits/sec    0   73.5 KBytes       
[  5]   4.00-5.00   sec  5.28 MBytes  44.3 Mbits/sec    1   1.41 KBytes       
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
^C[  5]  10.00-13.63  sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-13.63  sec  41.8 MBytes  25.7 Mbits/sec    5             sender
[  5]   0.00-13.63  sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

This is suggesting to me that there's some problem forwarding packets between the subnets. I first ensured that my iptables rules are as minimal as possible:

-t nat -A POSTROUTING -o ppp0 -j MASQUERADE
# WAN connection is PPPoE and VLAN tagged
-t filter -A FORWARD -o ppp0 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS  --clamp-mss-to-pmtu

Dumping the iptables state I see low packet counts for both rules.

Next I checked for packet loss. There does seem to be a small but consistent amount of packet loss / retransmits.

$ sudo netstat -s | egrep -i 'retransmit|drop'
    498 outgoing packets dropped
    25848 fast retransmits

I then thought that maybe there was a buffer or queue that was filling and packets were getting dropped. I calculated the average bandwidth-delay product and compared that against the reserved memory.

$ sudo ping -f 10.0.0.2 -s $((1500-28))               
PING 10.0.0.2 (10.0.0.2) 1472(1500) bytes of data.
.^C
--- 10.0.0.2 ping statistics ---
9036 packets transmitted, 9035 received, 0% packet loss, time 26512ms
rtt min/avg/max/mdev = 1.742/2.817/12.057/0.758 ms, pipe 2, ipg/ewma 2.934/3.091 ms

$ echo "1*(1024^3) * 0.003" | bc 
3221225.472

$ cat /proc/sys/net/ipv4/tcp_mem
18396   24529   36792

$ getconf PAGESIZE
4096

That appears to be sufficient. So now I'm a bit stuck. I ran tcpdump on the iperf3 client and can see things moving along well for a bit. Then I see a long (almost 250ms) period of silence before lots of retransmits and duplicate acknowledgements.

Since I can pull sufficient download speeds from the WAN I don't suspect that the onboard NIC is at fault. I'm looking for help to diagnose this quad-NIC (details below) and possibly a dumb layer-2 gigabit switch (Netgear GS-108) and any other kernel configuration that could be getting in the way. I doubt it's the switch, as it's never been a problem before and I can maintain speeds from the router's loopback to that subnet. Only inter-subnet performance appears to be affected.

  *-network:0               
       description: Ethernet interface
       product: 82571EB Gigabit Ethernet Controller (Copper)
       vendor: Intel Corporation
       physical id: 0
       bus info: pci@0000:03:00.0
       logical name: enp3s0f0
       version: 06
       serial: 00:26:55:xx:xx:xx
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=3.2.6-k duplex=full firmware=5.12-2 ip=10.0.0.1 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
       resources: irq:24 memory:fe920000-fe93ffff memory:fe880000-fe8fffff ioport:d020(size=32)

UPDATE:

$ iperf3 -b 20m -c 10.0.0.2
Connecting to host 10.0.0.2, port 5201
[  5] local 10.192.128.3 port 36554 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.49 MBytes  20.9 Mbits/sec    0    158 KBytes       
[  5]   1.00-2.00   sec  2.38 MBytes  19.9 Mbits/sec    0    150 KBytes       
[  5]   2.00-3.00   sec  2.38 MBytes  19.9 Mbits/sec    1    133 KBytes       
[  5]   3.00-4.00   sec  2.38 MBytes  19.9 Mbits/sec    0   73.5 KBytes       
[  5]   4.00-5.00   sec  2.38 MBytes  19.9 Mbits/sec    0   70.7 KBytes       
[  5]   5.00-6.00   sec  1.12 MBytes  9.44 Mbits/sec    2   1.41 KBytes       
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    2   1.41 KBytes       
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
iperf3: error - control socket has closed unexpectedly

$ iperf3 -b 10m -c 10.0.0.2 
Connecting to host 10.0.0.2, port 5201
[  5] local 10.192.128.3 port 36564 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.24 MBytes  10.4 Mbits/sec    0    201 KBytes       
[  5]   1.00-2.00   sec  1.25 MBytes  10.5 Mbits/sec    0    118 KBytes       
[  5]   2.00-3.00   sec  1.12 MBytes  9.44 Mbits/sec    0    127 KBytes       
[  5]   3.00-4.00   sec  1.25 MBytes  10.5 Mbits/sec    0    107 KBytes       
[  5]   4.00-5.00   sec  1.12 MBytes  9.44 Mbits/sec    0    110 KBytes       
[  5]   5.00-6.00   sec  1.25 MBytes  10.5 Mbits/sec    0   90.0 KBytes       
[  5]   6.00-7.00   sec  1.12 MBytes  9.44 Mbits/sec    0   87.2 KBytes       
[  5]   7.00-8.00   sec  1.25 MBytes  10.5 Mbits/sec    0   81.6 KBytes       
[  5]   8.00-9.00   sec  1.12 MBytes  9.44 Mbits/sec    0   78.8 KBytes       
[  5]   9.00-10.00  sec  1.25 MBytes  10.5 Mbits/sec    0    112 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  12.0 MBytes  10.1 Mbits/sec    0             sender
[  5]   0.00-10.04  sec  12.0 MBytes  10.0 Mbits/sec                  receiver

iperf Done.
13
  • What do you mean by "setting up subnets on each of the four LAN interfaces"? Did you create four LANs or one? And if one, then with four interfaces in the same LAN, you should only be setting up one IP address and one subnet. Commented Feb 25, 2018 at 6:45
  • @DavidSchwartz Four subnets, with the interfaces configured as 10.0.0.1, 10.64.0.1, 10.128.0.1, and 10.192.0.1.
    – Huckle
    Commented Feb 25, 2018 at 6:46
  • Please give an ifconfig -a or ip addr. Looking for mac addresses on the quad NIC.
    – Pedro
    Commented Feb 25, 2018 at 6:46
  • 1
    @Pedro - retested from two wired devices both directly attached on different subnets (no layer 2 switch this time on the one subnet). Essentially identical results
    – Huckle
    Commented Feb 25, 2018 at 6:51
  • 1
    @Pedro 20 Mbps and 10 Mbps tests posted. 20 is a no-go. Seems to sustain 10. Breaks down around 12.
    – Huckle
    Commented Feb 25, 2018 at 7:01

1 Answer 1

0

Thanks to @Pedro for helping me dig in. Originally I thought this was a bad piece of hardware, but after replacing it with another I'm certain it's a driver problem. I'm still digging in to find out if this is a bug that's been reported already or not (and whether a fix exists). In the mean time, I did locate a serverfault question which linked to a bug report that suggested turning off several offloading features. This at least got me from 0 bps to ~270 Mbps stably. Far short of the ~940 Mbps it is capable of, but better than nothing while I continue researching.

ethtool -K eth0 gso off gro off tso off

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .