- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there, how are you?
We're trying to get the max possible performance (throughput) out of our servers. Here are some context followed by the main changes we did and then a lot of questions in regard of Intel 82599 and linux parameters, please let us know if we miss any part or ask if you need any further clarification. (like buffer/queue sizes, ring buffer, qdisc or rcv/send buffer)
Context:
- Goal: to get the max throughput through packet locality (latency isn't to bother under 0.5s)
- Load: mostly video (streaming) chunks, ranging from 170KB to 2.3MB
- App (user land): nginx (multi process pined by each core)
- OS (kernel): RHEL 7.4 (3.10)
- NIC (driver): Intel(R) 82599 10 Gigabit Dual Port Network Connection (rev 01) (ixgbe - 5.3.7)
- Bonding: Bonding Mode: IEEE 802.3ad Dynamic link aggregation (it's a single card with two inputs 10Gbps each giving us 20Gbps)
- HW: CPU=Intel(R) Xeon(R) CPU E5-2630L 0 @ 2.00GHz, Hyper Thread=off, CPU's socket=2, CPU's core: 12, 64GB RAM
- NUMA layout:
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5
node 0 size: 32605 MB
node 0 free: 30680 MB
node 1 cpus: 6 7 8 9 10 11
node 1 size: 32767 MB
node 1 free: 30952 MB
node distances:
node 0 1
0: 10 20
1: 20 10
What we did:
- Install the latest driver (ixgbe - 5.3.7)
- Run set_irq_affinity -x local ethX with `x` option (enabling RSS and XPS) and for local NUMA
- Enabled Flow director: ntuple-filters on
- Set affinity for our user land application (nginx's worker_cpu_affinity auto)
- XPS seems to be enabled (cat /sys/class/net/eth0/queues/tx-0/xps_cpus)
- RSS seems to be working (cat /proc/interrupts | grep eth)
- RFS seems to be disabled (cat /proc/sys/net/core/rps_sock_flow_entries; shows 0)
- RPS seems to be disabled (cat /sys/class/net/eth3/queues/rx-10/rps_cpus ; show 00000,00000 for all queues)
Questions:
- Do we need to enable RPS for the (HW acc) RSS to work? (when we check /sys/class/net/eth0/queues/rx-0/rps_cpus it does has 00000000,00000000 for all the queues)
- Do we need to enable RFS for the Flow Director to work? (cat /proc/sys/net/core/rps_sock_flow_entries; shows 0)
- Do we need to add any rule for Flow Director to work? (on TCP4 case, since we can't see any explicit rule we supposed it uses the perfect hash (src and dst ip and port))
- How can we be sure that RSS and flow director are working properly?
- Why can't we use the most modern QDisc for this multiple queue driver/NIC? (like fq or fq_codel we tried to set up with sysctl net.core.default_qdisc, is it because of multiple queues?)
- Does a single NIC only connects directly to a single NUMA node? (when we run set_irq_affinity -x local ethX it set all the queues to the first NUMA node)
- If 6) is true then what's better for throughput: to pin the NIC to a single NUMA node or to spread the multiple queues into all the nodes?
- Still if 6) is true then if we buy a second NIC card are we able to make it connected to the second NUMA node?
- We tried to set coalescence for the TX ring buffer (ethtool -C eth3 tx-usecs 84) it just ignored our value, isn't possible to set coalescence for TX ring buffer?
- Should we enable HT but use as few queues as real cpus/cores?
If you read until here, thank you very much
References:
- https://software.intel.com/en-us/articles/setting-up-intel-ethernet-flow-director How to Set Up Intel® Ethernet Flow Director | Intel® Software
- https://www.kernel.org/doc/Documentation/networking/scaling.txt https://www.kernel.org/doc/Documentation/networking/scaling.txt
- https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.…
- https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf
- https://blog.cloudflare.com/how-to-achieve-low-latency/ https://blog.cloudflare.com/how-to-achieve-low-latency/
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Leandromoreira,
Thank you for posting in Wired Communities. Can you confirm what is the exact 82559 Dual port network adapter you have? Is this an onboard NIC on certain board or standalone X520 (82599) server adapter? Please provide the exact model if it is X520 series network card e.g. x520-SR2 or X520-DA2 etc.
We will further check based on the information provided then update you.
Regards,
Sharon T
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sharon,
Thanks for the reply, I think it is the Intel(R) 82599 10 Gigabit Dual Port Network Connection (rev 01).
When I run lspci -v | grep -i ether, it gaves me the following list
02:00.0 Ethernet controller: Intel(R) I350 Gigabit Network Connection (rev 01)
Subsystem: Hewlett-Packard Company Ethernet 1Gb 2-port 361i Adapter
02:00.1 Ethernet controller: Intel(R) I350 Gigabit Network Connection (rev 01)
Subsystem: Hewlett-Packard Company Ethernet 1Gb 2-port 361i Adapter
07:00.0 Ethernet controller: Intel(R) 82599 10 Gigabit Dual Port Network Connection (rev 01)
Subsystem: Hewlett-Packard Company HPE Ethernet 10Gb 2-port 560FLR-SFP+ Adapter
07:00.1 Ethernet controller: Intel(R) 82599 10 Gigabit Dual Port Network Connection (rev 01)
Subsystem: Hewlett-Packard Company HPE Ethernet 10Gb 2-port 560FLR-SFP+ Adapter
When I run lshw -class network, it outputs the following (I cut off the 1Gbps onboard):
*-network:0
description: Ethernet interface
product: 82599 10 Gigabit Dual Port Network Connection
vendor: Intel(R)
physical id: 0
bus info: pci@0000:07:00.0
logical name: eth2
version: 01
serial: 9c:b6:54:71:49:67
size: 10Gbit/s
capacity: 10Gbit/s
width: 32 bits
clock: 33MHz
capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 10000bt-fd
configuration: autonegotiation=off broadcast=yes driver=ixgbe driverversion=5.3.7 duplex=full firmware=0x80000838, 1.1618.0 latency=0 link=yes multicast=yes slave=yes speed=10Gbit/s
resources: irq:42 memory:f7f00000-f7ffffff ioport:7000(size=32) memory:f7ef0000-f7ef3fff memory:f7c00000-f7c7ffff memory:bc000000-bc0fffff memory:bc100000-bc1fffff
*-network:1
description: Ethernet interface
product: 82599 10 Gigabit Dual Port Network Connection
vendor: Intel(R)
physical id: 0.1
bus info: pci@0000:07:00.1
logical name: eth3
version: 01
serial: 9c:b6:54:71:49:67
size: 10Gbit/s
capacity: 10Gbit/s
width: 32 bits
clock: 33MHz
capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 10000bt-fd
configuration: autonegotiation=off broadcast=yes driver=ixgbe driverversion=5.3.7 duplex=full firmware=0x80000838, 1.1618.0 latency=0 link=yes multicast=yes slave=yes speed=10Gbit/s
resources: irq:77 memory:f7d00000-f7dfffff ioport:7020(size=32) memory:f7cf0000-f7cf3fff memory:f7e00000-f7e7ffff memory:bc200000-bc2fffff memory:bc300000-bc3fffff
I suppose the first two are onboard (but I'm sure we don't use it) and the others (I think a single NIC but with dual port) are standalone.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Leandromoreira,
Thank you for the information. We will check on this.
Regards,
Sharon T
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> Why can't we use the most modern QDisc for this multiple queue driver/NIC? (like fq or fq_codel we tried to set up with sysctl net.core.default_qdisc, is it because of multiple queues?)
I think this question is already answered, it seems that is due to the kernel version we use, we need to upgrade in order to have access to these qdisc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Leandromoreira,
Thank you for sharing the information. Receive Packet Steering (RPS) and Receive Flow Steering (RFS) are not compatible with Flow Director. If Flow Director is enabled, these will be disabled. You may refer to the flow director section in the README file below
https://downloadmirror.intel.com/14687/eng/readme.txt
Regards,
Sharon T
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nice, thanks for sharing, I'd imagine that! Now we're missing just 4 questions.
Questions:
- Does a single NIC only connects directly to a single NUMA node? (when we run set_irq_affinity -x local ethX it set all the queues to the first NUMA node)
- If 1) is true then what's better for throughput: to pin the NIC to a single NUMA node or to spread the multiple queues into all the nodes?
- Still if 1) is true then if we buy a second NIC card are we able to make it connected to the second NUMA node?
- We tried to set coalescence for the TX ring buffer (ethtool -C eth3 tx-usecs 84) it just ignored our value, isn't possible to set coalescence for TX ring buffer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Leandromoreira,
You are welcome. I will further verify on these questions. Thank you.
Regard,
Sharon T
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just to add context to the Flow Director, while I was reading the https://www.kernel.org/doc/Documentation/networking/ixgbe.txt ixgbe kernel doc I noticed how to:
- enable Intel(R) Ethernet Flow Director: ethtool -K ethX ntuple on (done)
- add filter: ethtool -U ethX flow-type tcp4 src-ip 10.0.128.23 (didn't do)
- check if it's working: ethtool -s ethX (done greping fdir but I just see fdir_miss increase while fdir_match keeps static)
Should we add a filter? how can I add a filter which uses srt,dst, port and ip?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Leandromoreira,
Thank you for sharing the information. With regards to your question, please find below the information:
Question # 1 Does a single NIC only connects directly to a single NUMA node? (when we run set_irq_affinity -x local ethX it set all the queues to the first NUMA node)?
-There is a similar question posted in Redhat forum at https://access.redhat.com/discussions/2777691,
You can also try this command to use all cores - set_irq_affinity -x all ethX, please refer to page 5 ( https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf)
With regards to your last question: - how can I add a filter which uses srt,dst, port and ip?
You can try this command and refer to this website: reference - https://software.intel.com/en-us/articles/setting-up-intel-ethernet-flow-director
ethtool --config-ntuple flow-type tcp4 src-ip 10.23.4.6 dst-ip 10.23.4.18 src-port 2000 dst-port 2001 action 4
(Note: 4 is the # of queue)
Just to double check if you have post your inquiry at Redhat support?
Thanks,
Sharon T
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Leandromoreira,
Please feel free to update me if you still have other inquiries and if you need further clarification.
Thank you.
Regards,
Sharon T
Intel Customer Support
Agent under contract to Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Leandromoreira,
Please refer to the additional information for your inquiries:
If 1) is true then what's better for throughput: to pin the NIC to a single NUMA node or to spread the multiple queues into all the nodes?
- This would depend on load and type of traffic, it is always best to test under your own specific circumstances.
Still if 1) is true then if we buy a second NIC card are we able to make it connected to the second NUMA node?
- The answer is Yes.
We tried to set coalescence for the TX ring buffer (ethtool -C eth3 tx-usecs 84) it just ignored our value, isn't possible to set coalescence for TX ring buffer?
-This is not supported
Hope the above information help.
Regards,
Sharon T
Intel Customer Support
Agent under contract to Intel
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page