Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
3,957 Views

Questions about Intel 82599: Flow director + NUMA + performance

Hi there, how are you?

We're trying to get the max possible performance (throughput) out of our servers. Here are some context followed by the main changes we did and then a lot of questions in regard of Intel 82599 and linux parameters, please let us know if we miss any part or ask if you need any further clarification. (like buffer/queue sizes, ring buffer, qdisc or rcv/send buffer)

Context:

  • Goal: to get the max throughput through packet locality (latency isn't to bother under 0.5s)
  • Load: mostly video (streaming) chunks, ranging from 170KB to 2.3MB
  • App (user land): nginx (multi process pined by each core)
  • OS (kernel): RHEL 7.4 (3.10)
  • NIC (driver): Intel(R) 82599 10 Gigabit Dual Port Network Connection (rev 01) (ixgbe - 5.3.7)
  • Bonding: Bonding Mode: IEEE 802.3ad Dynamic link aggregation (it's a single card with two inputs 10Gbps each giving us 20Gbps)
  • HW: CPU=Intel(R) Xeon(R) CPU E5-2630L 0 @ 2.00GHz, Hyper Thread=off, CPU's socket=2, CPU's core: 12, 64GB RAM
  • NUMA layout:

available: 2 nodes (0-1)

node 0 cpus: 0 1 2 3 4 5

node 0 size: 32605 MB

node 0 free: 30680 MB

node 1 cpus: 6 7 8 9 10 11

node 1 size: 32767 MB

node 1 free: 30952 MB

node distances:

node 0 1

0: 10 20

1: 20 10

What we did:

  • Install the latest driver (ixgbe - 5.3.7)
  • Run set_irq_affinity -x local ethX with `x` option (enabling RSS and XPS) and for local NUMA
  • Enabled Flow director: ntuple-filters on
  • Set affinity for our user land application (nginx's worker_cpu_affinity auto)
  • XPS seems to be enabled (cat /sys/class/net/eth0/queues/tx-0/xps_cpus)
  • RSS seems to be working (cat /proc/interrupts | grep eth)
  • RFS seems to be disabled (cat /proc/sys/net/core/rps_sock_flow_entries; shows 0)
  • RPS seems to be disabled (cat /sys/class/net/eth3/queues/rx-10/rps_cpus ; show 00000,00000 for all queues)

Questions:

  1. Do we need to enable RPS for the (HW acc) RSS to work? (when we check /sys/class/net/eth0/queues/rx-0/rps_cpus it does has 00000000,00000000 for all the queues)
  2. Do we need to enable RFS for the Flow Director to work? (cat /proc/sys/net/core/rps_sock_flow_entries; shows 0)
  3. Do we need to add any rule for Flow Director to work? (on TCP4 case, since we can't see any explicit rule we supposed it uses the perfect hash (src and dst ip and port))
  4. How can we be sure that RSS and flow director are working properly?
  5. Why can't we use the most modern QDisc for this multiple queue driver/NIC? (like fq or fq_codel we tried to set up with sysctl net.core.default_qdisc, is it because of multiple queues?)
  6. Does a single NIC only connects directly to a single NUMA node? (when we run set_irq_affinity -x local ethX it set all the queues to the first NUMA node)
  7. If 6) is true then what's better for throughput: to pin the NIC to a single NUMA node or to spread the multiple queues into all the nodes?
  8. Still if 6) is true then if we buy a second NIC card are we able to make it connected to the second NUMA node?
  9. We tried to set coalescence for the TX ring buffer (ethtool -C eth3 tx-usecs 84) it just ignored our value, isn't possible to set coalescence for TX ring buffer?
  10. Should we enable HT but use as few queues as real cpus/cores?

If you read until here, thank you very much

References:

0 Kudos
11 Replies
Highlighted
Community Manager
89 Views

Hi Leandromoreira,

Thank you for posting in Wired Communities. Can you confirm what is the exact 82559 Dual port network adapter you have? Is this an onboard NIC on certain board or standalone X520 (82599) server adapter? Please provide the exact model if it is X520 series network card e.g. x520-SR2 or X520-DA2 etc.

We will further check based on the information provided then update you.

Regards,

 

Sharon T
Highlighted
Beginner
89 Views

Hi Sharon,

Thanks for the reply, I think it is the Intel(R) 82599 10 Gigabit Dual Port Network Connection (rev 01).

When I run lspci -v | grep -i ether, it gaves me the following list

02:00.0 Ethernet controller: Intel(R) I350 Gigabit Network Connection (rev 01)

Subsystem: Hewlett-Packard Company Ethernet 1Gb 2-port 361i Adapter

02:00.1 Ethernet controller: Intel(R) I350 Gigabit Network Connection (rev 01)

Subsystem: Hewlett-Packard Company Ethernet 1Gb 2-port 361i Adapter

07:00.0 Ethernet controller: Intel(R) 82599 10 Gigabit Dual Port Network Connection (rev 01)

Subsystem: Hewlett-Packard Company HPE Ethernet 10Gb 2-port 560FLR-SFP+ Adapter

07:00.1 Ethernet controller: Intel(R) 82599 10 Gigabit Dual Port Network Connection (rev 01)

Subsystem: Hewlett-Packard Company HPE Ethernet 10Gb 2-port 560FLR-SFP+ Adapter

When I run lshw -class network, it outputs the following (I cut off the 1Gbps onboard):

*-network:0

description: Ethernet interface

product: 82599 10 Gigabit Dual Port Network Connection

vendor: Intel(R)

physical id: 0

bus info: pci@0000:07:00.0

logical name: eth2

version: 01

serial: 9c:b6:54:71:49:67

size: 10Gbit/s

capacity: 10Gbit/s

width: 32 bits

clock: 33MHz

capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 10000bt-fd

configuration: autonegotiation=off broadcast=yes driver=ixgbe driverversion=5.3.7 duplex=full firmware=0x80000838, 1.1618.0 latency=0 link=yes multicast=yes slave=yes speed=10Gbit/s

resources: irq:42 memory:f7f00000-f7ffffff ioport:7000(size=32) memory:f7ef0000-f7ef3fff memory:f7c00000-f7c7ffff memory:bc000000-bc0fffff memory:bc100000-bc1fffff

*-network:1

description: Ethernet interface

product: 82599 10 Gigabit Dual Port Network Connection

vendor: Intel(R)

physical id: 0.1

bus info: pci@0000:07:00.1

logical name: eth3

version: 01

serial: 9c:b6:54:71:49:67

size: 10Gbit/s

capacity: 10Gbit/s

width: 32 bits

clock: 33MHz

capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 10000bt-fd

configuration: autonegotiation=off broadcast=yes driver=ixgbe driverversion=5.3.7 duplex=full firmware=0x80000838, 1.1618.0 latency=0 link=yes multicast=yes slave=yes speed=10Gbit/s

resources: irq:77 memory:f7d00000-f7dfffff ioport:7020(size=32) memory:f7cf0000-f7cf3fff memory:f7e00000-f7e7ffff memory:bc200000-bc2fffff memory:bc300000-bc3fffff

I suppose the first two are onboard (but I'm sure we don't use it) and the others (I think a single NIC but with dual port) are standalone.

0 Kudos
Highlighted
Community Manager
89 Views

Hi Leandromoreira,

 

 

Thank you for the information. We will check on this.

 

 

Regards,

 

Sharon T
0 Kudos
Highlighted
Beginner
89 Views

> Why can't we use the most modern QDisc for this multiple queue driver/NIC? (like fq or fq_codel we tried to set up with sysctl net.core.default_qdisc, is it because of multiple queues?)

I think this question is already answered, it seems that is due to the kernel version we use, we need to upgrade in order to have access to these qdisc.

0 Kudos
Highlighted
Community Manager
89 Views

Hi Leandromoreira,

 

 

Thank you for sharing the information. Receive Packet Steering (RPS) and Receive Flow Steering (RFS) are not compatible with Flow Director. If Flow Director is enabled, these will be disabled. You may refer to the flow director section in the README file below

 

https://downloadmirror.intel.com/14687/eng/readme.txt

 

 

Regards,

 

Sharon T

 

Highlighted
Beginner
89 Views

Nice, thanks for sharing, I'd imagine that! Now we're missing just 4 questions.

Questions:

  1. Does a single NIC only connects directly to a single NUMA node? (when we run set_irq_affinity -x local ethX it set all the queues to the first NUMA node)
  2. If 1) is true then what's better for throughput: to pin the NIC to a single NUMA node or to spread the multiple queues into all the nodes?
  3. Still if 1) is true then if we buy a second NIC card are we able to make it connected to the second NUMA node?
  4. We tried to set coalescence for the TX ring buffer (ethtool -C eth3 tx-usecs 84) it just ignored our value, isn't possible to set coalescence for TX ring buffer
0 Kudos
Highlighted
Community Manager
89 Views

Hi Leandromoreira,

 

 

You are welcome. I will further verify on these questions. Thank you.

 

 

Regard,

 

Sharon T
0 Kudos
Highlighted
Beginner
89 Views

Just to add context to the Flow Director, while I was reading the https://www.kernel.org/doc/Documentation/networking/ixgbe.txt ixgbe kernel doc I noticed how to:

  • enable Intel(R) Ethernet Flow Director: ethtool -K ethX ntuple on (done)
  • add filter: ethtool -U ethX flow-type tcp4 src-ip 10.0.128.23 (didn't do)
  • check if it's working: ethtool -s ethX (done greping fdir but I just see fdir_miss increase while fdir_match keeps static)

Should we add a filter? how can I add a filter which uses srt,dst, port and ip?

0 Kudos
Highlighted
Community Manager
89 Views

Hi Leandromoreira,

 

 

Thank you for sharing the information. With regards to your question, please find below the information:

 

Question # 1 Does a single NIC only connects directly to a single NUMA node? (when we run set_irq_affinity -x local ethX it set all the queues to the first NUMA node)?

 

-There is a similar question posted in Redhat forum at https://access.redhat.com/discussions/2777691,

 

You can also try this command to use all cores - set_irq_affinity -x all ethX, please refer to page 5 ( https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance...)

 

 

With regards to your last question: - how can I add a filter which uses srt,dst, port and ip?

 

 

You can try this command and refer to this website: reference - https://software.intel.com/en-us/articles/setting-up-intel-ethernet-flow-director

 

ethtool --config-ntuple flow-type tcp4 src-ip 10.23.4.6 dst-ip 10.23.4.18 src-port 2000 dst-port 2001 action 4

 

(Note: 4 is the # of queue)

 

 

Just to double check if you have post your inquiry at Redhat support?

 

 

Thanks,

 

Sharon T

 

0 Kudos
Highlighted
Community Manager
89 Views

Hi Leandromoreira,

 

 

Please feel free to update me if you still have other inquiries and if you need further clarification.

 

 

Thank you.

 

 

Regards,

 

Sharon T

 

Intel Customer Support

 

Agent under contract to Intel

 

0 Kudos
Highlighted
Community Manager
89 Views

Hi Leandromoreira,

 

 

Please refer to the additional information for your inquiries:

 

 

If 1) is true then what's better for throughput: to pin the NIC to a single NUMA node or to spread the multiple queues into all the nodes?

 

- This would depend on load and type of traffic, it is always best to test under your own specific circumstances.

 

 

Still if 1) is true then if we buy a second NIC card are we able to make it connected to the second NUMA node?

 

- The answer is Yes.

 

 

We tried to set coalescence for the TX ring buffer (ethtool -C eth3 tx-usecs 84) it just ignored our value, isn't possible to set coalescence for TX ring buffer?

 

-This is not supported

 

 

Hope the above information help.

 

 

Regards,

 

Sharon T

 

Intel Customer Support

 

Agent under contract to Intel

 

 

0 Kudos