I was reading test report documents (450257_450257_DPDK_Test_Report_Rev0.7.pdf and 450257_450257_DPDK_Test_Report_Rev0.8.pdf) and could not find any benhmarks relating to the Max RX throughput for 1 RX Queue/1 Port with 0% loss. The closest is in section "11 Benchmark Results for the Intel® DPDK Layer 3 Forwarding Tests" but this shows performance for 1 queue on 4 NIC's for forwarding of packets using LPM or hashing. I want to see performance for 1 queue / 1 port and 1 NIC.
I have attached some quick example/prototype code that I put together using the test-pmd and sample applications as reference. All the code does is receives packets and then just frees the mbuf rte_pktmbuf_free() - so it basically just counts packets. I have played around with the max_burst param of rte_eth_rx_burst() and also for various values for rx_conf.rx_free_thresh but no matter what combinations I tried it could not handle 1 Gbit/200,000 pps of traffic with 0% packet loss. It should be possible to achieve this?
Im running on a ATCA blade with dual X5670 @ 2.93GHz, 48Gb RAM DDR3 1066 (6x8gb, one 8 gb quad rank module per channel) , 5520 tylersburg chipset. The 82599 is positioned on a mezzaine board connected via PCI Express:5.0Gb/s:Width x8
I set the BIOS options that are mentioned in the test report.I made the following changes to the config to increase the size of the mem pool cache so all buffers would be in the cache. +CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=65536-CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=512
Some other relavent params set in the code are as follows:
# define NUM_RX_DESC (4096) - passed to rte_eth_rx_queue_setup()
# define MAX_PKT_BURST (64) - passed to rte_eth_rx_burst()
# define MBUFS_PER_POOL (8192 * 16) - passed to rte_mempool_create()
# define MBUF_CACHE_SIZE (8192 * 8) - passed to rte_mempool_create()
I ran the app using the following parameters ./drv -c 0x5 so that it runs the processing thread on core 2 as it is setup to. (also used the setup.sh script to allocate the huge pages to be used by the app).
I would very much appreicate if one of the developers could look over the code/run it very quickly and if they could point out params/config options I could use to improve performance. And what performance I should expect to be able to receive packets at on 1 RX queue/ 1 Port / 1 NIC when all I'm doing is counting packets.