0

I have an Intel 10Gbps card that has the 82599 10GbE Controller. The card has two ports in it. The datasheet of the controller says it supports PCIe 2.0 (2.5 GT/s or 5.0 GT/s)

Now, according to PCIe SIG's faq page (link: https://www.pcisig.com/news_room/faqs/pcie3.0_faq/#EQ3) says that for a 5.0 GT/s symbol rate PCIe gives an interconnect bandwidth of 4Gbps and a per lane per direction of 500MB/s)

I ran a netperf test on the card (I connected two of these cards via OFC back-to-back with no switches in-between) and the bandwidth of around 3.3Gbps (which is around 400MB/s)

Is my card under-utilized or does those numbers add up? Why wouldn't I get a full 10Gbps on the card (and only get 3.3Gbps)

(The card is x4 on an x8 slot)

Update: The network card goes to a slot that is configured as PCIe 3.0 and its an x8 slot (it supports upto 8.0 GT/s). And as to the board itself, well its a Freescale board (Processor: T4240). So I figured that board might be ok, with the card being slower of the two.

Thanks in advance.

2
  • What motherboard are you using and in which slots exactly are cards (including the network card) located?
    – Daniel B
    Commented Sep 8, 2014 at 15:17
  • You have this 10Gbps card connected to something. What is the specifications of that hardware?
    – Ramhound
    Commented Sep 8, 2014 at 15:29

2 Answers 2

3

There are many reasons why you may not be seeing 10Gbps across the link. I can offer the following:

  • PCIe 2.0 offers an effective bandwidth of 4Gbps per lane. A PCIe 2.0 4x card in a PCIe 2.0-or-better 8x slot will have a 4x link, providing 20Gbps of effective bandwidth. This is enough to handle both links being fully saturated assuming the rest of your hardware can handle it.
  • Many general-purpose desktop and server operating systems are not configured by default to handle high-bandwidth networking.

To get full performance out of that card, you'll want to:

  • Disable anything that will restrict performance of networking or CPU speed/interrupt processing:

Linux Example:

service irqbalance stop
service cpuspeed stop
chkconfig irqbalance off
chkconfig cpuspeed off
  • Enable 9K jumbo frames with a high transmit queue length:

Linux Example:

ifconfig eth2 mtu 9000 txqueuelen 1000 up
  • Increase the network buffers so that they can keep the card saturated with data:

Linux Example:

# -- 10gbe tuning from Intel ixgb driver README -- #

# turn off selective ACK and timestamps
net.ipv4.tcp_sack = 0
net.ipv4.tcp_timestamps = 0

# memory allocation min/pressure/max.
# read buffer, write buffer, and buffer space
net.ipv4.tcp_rmem = 10000000 10000000 10000000
net.ipv4.tcp_wmem = 10000000 10000000 10000000
net.ipv4.tcp_mem = 10000000 10000000 10000000

net.core.rmem_max = 524287
net.core.wmem_max = 524287
net.core.rmem_default = 524287
net.core.wmem_default = 524287
net.core.optmem_max = 524287
net.core.netdev_max_backlog = 300000

There is further tuning you can do to the PCI link, such as bumping the maximum block size to 4K. Properly tuned, you should be able to push about 9.90Gbps across each link.

Keep in mind that server and client, and every hop along the way (switch/router) must be similarly tuned in order to not bottleneck the data flow.

9
  • Well, I previously ran across this document kernel.org/doc/ols/2009/ols2009-pages-169-184.pdf which listed all these optimizations. I had performed them on my system. Well initially I maxed out at 2.5Gbps and I pushed it to 3.3Gbps playing around with those values exactly. I guess I would have to look at those numbers again. Commented Sep 8, 2014 at 16:05
  • @Vigneshwaren I assume you mean that you performed these optimizations on both of your systems? Or did you have two cards in the same system? Commented Sep 8, 2014 at 16:09
  • 1
    Nothing you mention above applies specifically to 10Gb ethernet. You didn't mention the most important reason for not getting full throughput and thats the 8b/10b (en.wikipedia.org/wiki/8b/10b_encoding) that knocks off 20% of your BW off the top.
    – Astara
    Commented Sep 8, 2014 at 16:11
  • BTW..how do you do the 4k PCI block size tune? Never had luck w/that one (I use a 9k on the wire, but never had luck changing the pci BS)...
    – Astara
    Commented Sep 8, 2014 at 16:14
  • Oh no. To be very clear. Both the systems (I had two identical boards of the same CPU variety) had one of the 10Gig cards connected to each board (I had two Intel 10Gigs as well). The boards then were connected to each other via an optic fibre cable with absolutely no other device in between (switches, routers, modems, the Internet etc.) And I would like to know how you tuned PCI block size as well. Kernel config maybe? Commented Sep 8, 2014 at 16:15
1

Same same here... turns out it is because the 10Gbps protocol revived the old modem encoding .. with a start/stop bit and 8 bits of data.

Today's rate:

R:512+0 records in
512+0 records out
4294967296 bytes (4.0GB) copied, 6.37415s, 642.6MB/s
W:512+0 records in
512+0 records out
4294967296 bytes (4.0GB) copied, 6.78951s, 603.3MB/s

(this is run on a Win7 client talking to null files on the linux end -- /dev/zero for reads, and /dev/null for writes).

For 'smb/cifs' and a single client, bonding 2 cards together doesn't help throughput (since smb/cifs is a 1 connection/client protocol). :-(

p.s.-This was not, BTW true on 1Gb and I don't think it is true on 40Gb... Lame! Feels like the diskspace MB!= 1024**2 Bytes issue when it first came out ... a way of making it sound better than it actually is...

5
  • If you're seeing <5Gbps on a 10Gbps link, that's because either the server or the client is not properly tuned for 10Gbps, or because of Smb/Cifs protocol overhead. Raw network performance of a 10Gbps link should be very close to 10Gbps if you've set it up right. Commented Sep 8, 2014 at 15:53
  • Re: protocol overhead... never said otherwise. On 1gbit smb/cifs allows up to 125MdB writes (thats 125 million, not 1024^3), and 119MdB Reads. Note that the above are in MB/GB speeds using 1024 as a base.
    – Astara
    Commented Sep 8, 2014 at 15:59
  • I'm confused by your "P.S." statement, as regardless of how the data is encoded on the wire, 10Gbps is 10Gbps. The link layer (1Gbps, 10Gbps, 40Gbps) operates at the given speed, and any overhead due to encoding on the Ethernet or IP level (for headers and such) will apply to all of them. Commented Sep 8, 2014 at 16:12
  • From the link above on 8b/10b it lists technologies that this applies to. One that is not 8b/10b encoded is Gigabit Ethernet twisted pair–based 1000Base-T Gigabit Ethernet, which seems to be the most common. They also mention the PCI-E bus for speeds below 8Gt/s -- so maybe when you get to 40&100 you are on a faster bus? That last is a guess, but I remember in the same article telling me about 10G using 8b/10b, that 1000BT didn't and thought it also said 40/100 didn't. But can't find that article. The link shows most common Gb not using that encoding.
    – Astara
    Commented Sep 8, 2014 at 16:22
  • 1
    Oh, I see the bad assumption I made. Twisted copper pair (1Gbps, 10Gbps, 40Gbps, +) does not use 8b/10b, but the fiber variants do. However, 10Gbps is still the effective speed ("These standards use 8b/10b encoding, which inflates the line rate by 25%") - so the physical fiber lines are carrying 12.5Gbps raw or 10Gbps effective. The math about PCIe 2.0 speeds at the top of my post does already include the overhead for 8b/10b encoding (hence the use of "effective bandwidth of 4Gbps per lane") Commented Sep 8, 2014 at 18:25

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .