Embedded Intel® Core™ Processors
Communicate Intel® Core™ Hardware, Software, Firmware, Graphics Concerns
1215 Discussions

Questions about performace of dpdk with qat on DH8920


Hi everyone,

My board is installed with DH8920 and E5-2658 to implement IPsec decryption and encryption. I wanna reach 20Gbps throughput for IPsec which is declared in DH8920 product document.

DPDK and QAT driver is installed and my application software is very similar with the example "dpdk_qat" provided by QAT1.5 with some changes to handle with the IPsec encapsulation.

QAT is used in user space with DP API.

In my host application I enable 8 logical cores and each core uses 1 instance for qat and has 1 rx queue for dpdk. So since there are only four engines for crypto, every 2 instances with ring size set to 4096(max value I can configure) share 1 engine.

But when I test with the performance of the application I encounter 1 problem. Hopefully someone can help me.

If I Send more than 10Gbps (11Gbps) IPsec encrypted pkts to my board with iperf3 tool, I find that sometimes for most of the instances CPA_STATUS_RETRY is returned after capCySymDpEnqueueOp is called leading to a lot of pkts lost, and after a while it recovered to normal. I know this means the request queue of QAT is full and nothing more can be put into it. Since the document tell us the DH8920 supports 20Gbps, I am really puzzled about:

1) Why could the request queue of QAT be full for somewhile when the traffic is only 11Gbps more or less? If it reaches the max capability why does that happen only for sometimes.

2) When that happen I increases the frequency of polling the responses by icp_sal_CyPollDpInstance, but no use and I find nearly 98% of results of icp_sal_CyPollDpInstance, are CPA_STATUS_RETRY.

3) I just adjust all the parameters in dpdk_qat like polling frequency and the frequency of setting PerformNow to CPA_TRUE as the intel QAT performace document recommanded.

It seems that sometimes for the qat instance there are only requests accumulating without any responses leading to tx queue full and after a while a lot of responses come out in a short time leading to recovery.

How can I find the reason and increase the performance to 20Gbps without any request queue of QAT full? by the way, if the performance is 20Gbps, how many is for 1 instance or 1 crypto engine which is not mentioned in document.

Thanks a lot !

Best regards,

0 Kudos
3 Replies

Hello lb ,

Thank you for contacting the Intel Embedded Community.

In order to better understand your questions, we would like to address the following consultations:

Could you please let us know if the affected design is your design or a third party one? In case that it is a third party project, please give us all the information related to it.

Could you please verify that the affected implementation fulfills with the guidelines stated in sections 26.2, 26.3.2, and 17 of the DPDK Sample Applications User Guide? These sections can be found at the following websites:

http://dpdk.org/doc/guides/sample_app_ug/intel_quickassist.html http://dpdk.org/doc/guides/sample_app_ug/intel_quickassist.html

http://dpdk.org/doc/guides/sample_app_ug/l3_forward.html http://dpdk.org/doc/guides/sample_app_ug/l3_forward.html

Please let us know this information to have a better idea of the cited situations.

Thanks in advance for your help.

Best Regards,

AdolfoS on behalf of Carlos_A .

0 Kudos

Thanks a lot for your reply.

The affected design is mostly based on the example dpdk_qat which follows the guidelines in sections 26.2, 26.3.2. And I add some codes about IPsec Processing referring to IPsec RFC and the DPDK example about IPsec. The main modifications on dpdk_qat are to adding codes about:

1) Query SA and SP for ESP pkt.

2) ESP pkt encapsulation and decapsulation after encryption and decryption by QAT.

3) Send control plane data (IKE and ARP and so on) to kernel by KNI interface in order to make the DPDK and QAT only deal with user plane data (ESP).

The process flow and codes about rx and sending pkts to network or QAT in main_loop are almost the same with dpdk_qat. That is:

1) flush queues periodically

2) get responses from QAT responses fifo and polling the QAT rx rings at a fix frequency if the fifo length smaller than some val.

3) if pkt = NLL, receive pkt from DPDK queues. And set the flag.

4) if pkt is received from DPDK queue, then to see whether it should be decrypted or encrypted.

5) ESP pkt encapsulation and decapsulation / SA and SP matching.----my codes.

6) eneque pkt to QAT with CPA_FALSE set and after a fix kick_freq set it to CPA_TRUE.

7) continue;

8) if QAT response get in 2), send it to network.

The command is very similar with dpdk_qat example:

./dpdk_qat -c 0x1ff -n 2 -- -p 0xf --config="(0,0,1),(0,1,2),(0,2,3),(0,3,4),(0,4,5),(0,5,6),(0,6,7),(0,7,8),(1,0,1),(1,1,2),(1,2,3),(1,3,4),(1,4,5),(1,5,6),(1,6,7),(1,7,8),(2,0,1),(2,1,2),(2,2,3),(2,3,4),(2,4,5),(2,5,6),(2,6,7),(2,7,8),(3,0,1),(3,1,2),(3,2,3),(3,3,4),(3,4,5),(3,5,6),(3,6,7),(3,7,8)"

the port 0,1 is bonded to one bonded port as output port and port 2,3 is bonded to another as input port. I did this to get 20G bandwith since 1 port is 10G.

The max concurrent symrequest is set to 4096 in config file. and lcores 0 is main core. it will not for receiving any pkts and be used as control core for receiving and configuring SA and SP.

since 8 lcores are used and DH8920 has only 4 crypto engines so two cores bonded to 1 engine.

I am really puzzled about the QAT rx queue full for a while when the traffic flow is above 10G I mentioned.

I watched the the file in /proc/icp_.../et_ring_ctl/bank_N/conf and see that sometimes the Head was suddently equal to Tail and Space was 0 when the problem happened But for most of time (Tail - Head) /(msg size) are below 10 and Space was very large. It seems that the QAT is slow down suddently for a while and speeds up then. I have no idea.

Please help me.

Thanks a lot !

Best regards,

0 Kudos

Hello lb ,

Thanks for your reply.

Please let me paraphrase again some of my previous questions:

Could you please tell me the processors and chipsets associated with this problem? By the way, please let us know if the affected project where these devices are used has been designed by you or another company? In case that it has been designed by another company, could you please give us all the information related to it?

Could you please confirm if this situation has been consulted to any of the http://dpdk.org/ml DPDK Mailing List contacts?

Thanks again for your cooperation to solve this inconvenience.

Best Regards,

AdolfoS on behalf of Carlos_A.

0 Kudos