Intel® QuickAssist Technology (Intel® QAT)
For questions and discussions related to Intel® QuickAssist Technology (Intel® QAT).
41 Discussions

need to replicate QAT dpdk perf test on x86

JCK1
Beginner
6,935 Views

Hi,

Based on https://fast.dpdk.org/doc/perf/DPDK_21_11_Intel_crypto_performance_report.pdf

I am having problems to replicate Intel's test on x86, the best I can get is 7.9gbps on Xeon(R) Gold 6338N CPU @ 2.20GHz from SuperMicro. I tried to follow your instructions of configuring BIOS and kernel settings, etc on your report.

The VFs were assigned with vfio-pci driver not QAT's VF driver.

Anything I could be missing?

Thanks in advance.

 

JC

0 Kudos
61 Replies
JCK1
Beginner
1,384 Views

Hi Ronny

 

for crypto_qat device type here is the published Intel result:

   

AES-CBC-128/SHA1-HMAC (Gbps)

crypto_qat

AES-CBC-128/SHA2-256-HMAC (Gbps)

crypto_qat

AES-GCM-128 (Gbps)

crypto_qat

3.903.893.35
7.727.686.66
15.0614.9213.10
28.3127.9924.69
45.6046.6739.58
52.7052.45

49.85

 

on our lab machine Gold 6338N CPU @ 2.20GHz, the results are:

AES-CBC-128/SHA1-HMAC (Gbps)

crypto_qat

AES-CBC-128/SHA2-256-HMAC (Gbps)

crypto_qat

AES-GCM-128 (Gbps)

crypto_qat

0.81380.81040.7035
1.61041.60041.4020
3.14183.11452.7618
5.93905.88025.2268
9.22489.08378.2348
10.965710.805710.3737

 

We are trying to bring up the system based on Sapphire Rapid to run the tests again.

 

Are we going to run tests on crypto_scheduler? 

 

Thanks

JCK

0 Kudos
Ronny_G_Intel
Moderator
1,337 Views

Hi JCK1,

 

We need some additional clarification, can you please provide the command that you are running for the Cryptodev QAT PMD performance test (test we are concentrating on)?

Are these the results?

 

AES-CBC-128/SHA1-HMAC (Gbps)

crypto_qat

AES-CBC-128/SHA2-256-HMAC (Gbps)

crypto_qat

AES-GCM-128 (Gbps)

crypto_qat

3.90 3.89 3.35
7.72 7.68 6.66
15.06 14.92 13.10
28.31 27.99 24.69
45.60 46.67 39.58
52.70 52.45

49.85

 

on our lab machine Gold 6338N CPU @ 2.20GHz, the results are:

AES-CBC-128/SHA1-HMAC (Gbps)

crypto_qat

AES-CBC-128/SHA2-256-HMAC (Gbps)

crypto_qat

AES-GCM-128 (Gbps)

crypto_qat

0.8138 0.8104 0.7035
1.6104 1.6004 1.4020
3.1418 3.1145 2.7618
5.9390 5.8802 5.2268
9.2248 9.0837 8.2348
10.9657 10.8057 10.3737

 

Thanks,

Ronny G

0 Kudos
JCK1
Beginner
1,323 Views

Hi Ronny

 

Yes the first table is from Intel published result. the second table is from my test on our x86 machine.

 

Thanks

JCK

0 Kudos
Ronny_G_Intel
Moderator
1,315 Views

Thanks JCK1, can you please provide the exact command that you are running to obtain these results?

 

Regards,

Ronny G 

0 Kudos
JCK1
Beginner
1,292 Views

Hi Ronny

 

Yes, run this:

./intel-cryptodev-qat-tests.sh [0|1|2] for 

0 - AES-CBC-128/SHA1-HMAC 

1 - AES-CBC-128/SHA2-256-HMAC

2 - AES-GCM-128 

 

thanks

JCK

0 Kudos
Ronny_G_Intel
Moderator
1,247 Views

Hi JCK1,

 

I really need your help with the full command that you are running.

We want to confirm that you are using scheduler PMD with QAT workers in round-robin.

Can you please provide the full command? 

 

Thanks,

Ronny G

0 Kudos
JCK1
Beginner
1,232 Views

Hi Ronny,

I tried to post reply here, but your system complained:
Your post has been changed because invalid HTML was found in the message body. The invalid HTML has been removed. Please review the message and submit the message when you are satisfied.

So I put every thing into a txt file and attached here.

JCK

 

 

 

 

0 Kudos
JCK1
Beginner
1,360 Views

Hi Ronny

 

Also Intel's Sapphire Rapid has QAT integrated into the SoC, so for QAT how to test its performance on SR? do you have any information can share with us to conduct our evaluation?  How is that supported in DPDK?

 

Thanks

JCK

0 Kudos
Ronny_G_Intel
Moderator
1,416 Views

and by the way JCK1, I have provided the DPDK team with your update, thank you.

0 Kudos
Ronny_G_Intel
Moderator
1,131 Views

Hi JCK1,


Thanks for the information, the .txt you provided me with has been shared with the DPDK team.


Thanks,

Ronny G


0 Kudos
Ronny_G_Intel
Moderator
772 Views

Hi JCK1,

This simulation and report below is focused on just TestCase 3, which is multi core QAT test with 4 VFs used, no scheduler PMD used and focused on AES-CBC-128 SHA1-HMAC.

This test equates to this first test run by you:


sudo $DPDK_TEST_CRYPTO_PERF/dpdk-test-crypto-perf \
--socket-mem 2048,0 --legacy-mem \
-a ${QAT_PF0}.0 -a ${QAT_PF0}.1 -a ${QAT_PF0}.2 -a ${QAT_PF0}.3 \
-l 4,5,13,6,14 -n 4 \
-- --buffer-sz 64,128,256,512,1024,2048 \
--optype cipher-then-auth --ptest throughput --auth-key-sz 64 --cipher-key-sz 16 \
--devtype crypto_qat --cipher-iv-sz 16 --auth-op generate --burst-sz 32 \
--total-ops 30000000 --digest-sz 20 --auth-algo sha1-hmac --cipher-algo aes-cbc --cipher-op encrypt

 

We run this test with a similar command.
Changes for socket-mem as our QAT is on socket 1 on system.
Also changed lcores used to be from socket 1 also (these are isolcpu in config so this should match the same you are using)
Please check that your QAT socket matches the socket of lcores used.
QAT socket is shown in DPDK app output: EAL: Probe PCI driver: qat (8086:37c9) device: 0000:b5:01.1 (socket 1)
And lcore sockets can be checked with DPDK app: ./usertools/cpu_layout.py

 

Intel Command:
./build/app/dpdk-test-crypto-perf --socket-mem 2048,2048 --legacy-mem -a 0000:b5:01.1 -a 0000:b5:01.2 -a 0000:b5:01.3 -a 0000:b5:01.4 -l 37,38,39,40,41 -n 4 -- --buffer-sz 64,128,256,512,1024,2048 --optype cipher-then-auth --ptest throughput --auth-key-sz 64 --cipher-key-sz 16 --devtype crypto_qat --cipher-iv-sz 16 --auth-op generate --burst-sz 32 --total-ops 30000000 --digest-sz 20 --auth-algo sha1-hmac --cipher-algo aes-cbc --cipher-op encrypt

 

Intel Results:

    lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf

 

          41          64          32    30000000    30000000   391959011   374040103      1.5700      0.8039     1719.71

          40          64          32    30000000    30000000   393476553   375387075      1.5700      0.8038     1719.78

          39          64          32    30000000    30000000   396301663   378362533      1.5700      0.8038     1719.79

          38          64          32    30000000    30000000   395045310   377272628      1.5699      0.8038     1719.83

          39         128          32    30000000    30000000   398887225   381068282      1.5573      1.5947     1733.72

          40         128          32    30000000    30000000   396297972   378294301      1.5573      1.5946     1733.81

          38         128          32    30000000    30000000   397938626   380308357      1.5569      1.5943     1734.23

          41         128          32    30000000    30000000   394682012   376872827      1.5572      1.5945     1733.92

          41         256          32    30000000    30000000   405075976   387162514      1.5187      3.1103     1777.86

          40         256          32    30000000    30000000   406794016   388691199      1.5188      3.1105     1777.74

          38         256          32    30000000    30000000   408170888   390414980      1.5185      3.1099     1778.06

          39         256          32    30000000    30000000   409437937   391469812      1.5187      3.1103     1777.84

          38         512          32    30000000    30000000   435607109   417513101      1.4310      5.8615     1886.74

          39         512          32    30000000    30000000   437163787   418859493      1.4307      5.8600     1887.23

          41         512          32    30000000    30000000   432472880   414204093      1.4308      5.8605     18876

          40         512          32    30000000    30000000   434418130   415985247      1.4305      5.8594     1887.42

          40        1024          32    30000000    30000000   583552510   564226360      1.1165      9.1464     2418.26

          41        1024          32    30000000    30000000   582614586   563287134      1.1157      9.1396     2420.06

          39        1024          32    30000000    30000000   586490123   567304136      1.1154      9.1377     2420.56

          38        1024          32    30000000    30000000   585874264   566681859      1.1154      9.1370     2420.74

          38        2048          32    30000000    30000000  1094437097  1073467007      0.6575     10.7730     4106.27

          40        2048          32    30000000    30000000  1091084015  1070100515      0.6575     10.7724     4106.49

          39        2048          32    30000000    30000000  1096021229  1075066115      0.6575     10.7729     4106.32

          41        2048          32    30000000    30000000  1053719015  1033070689      0.6575     10.7733     4106.17

 

Now, as stated in perf report, the results shown in report for multi-core are the sum of each core's perf results for that buffer size.
So for buffer 64, I have: 0.8039 + 0.8038 + 0.8038 + 0.8038 = 3.2153

That value isn't far off the reported value 3.90 below.
The other values are just short of the report results too.

06085499_1.png

From your previous community messages, you mentioned results around 0.8GBps for this, Are you looking at just one core result or was that the sum value of all cores?

06085499_2.png

Platform used for testing:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 112
On-line CPU(s) list: 0-111
Thread(s) per core: 2
Core(s) per socket: 28
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz
Stepping: 7
Frequency boost: enabled
CPU MHz: 3536.342
CPU max MHz: 2701.0000
CPU min MHz: 1000.0000
BogoMIPS: 5400.00
Virtualization: VT-x
L1d cache: 1.8 MiB
L1i cache: 1.8 MiB
L2 cache: 56 MiB
L3 cache: 77 MiB
NUMA node0 CPU(s): 0-27,56-83
NUMA node1 CPU(s): 28-55,84-111

 

To sum up, there doesn't seem to be a significant discrepancy between our performance results and those that you reported; they are quite similar. Your reported results closely match ours for a single core. 

 

I hope this helps.

 

Regards,

Ronny G

 

 

 

0 Kudos
JCK1
Beginner
706 Views

Hi Ronny,

 

Thanks for your reply.

 

So I plotted your results into a spreadsheet and compared your results with Intel official numbers. They are 17% difference, I would not say that is not far off. Sorry.

Thanks

JCK

0 Kudos
Ronny_G_Intel
Moderator
684 Views

Hi JCK1,


I recognize that the approximately 17% variance between the official DPDK test report and our test outcomes might seem substantial. However, it's important to remember that official test reports are conducted on optimized platforms, which can include specialized hardware, tailored BIOS settings, and even operating system optimization for the one porpuse of maximizing the test to be performed. Additionally, our hardware setup is not an exact match to the one used in the official tests. Official test reports should be viewed as a benchmark indicating the potential performance of a system under extremely controlled conditions and with a high degree of customization.


Please let me know if there is anything else I can help you with.


Regards,

Ronny G


0 Kudos
JCK1
Beginner
627 Views

Hi Ronny

I think it is important to have third-party to verify Intel official results independently. I understand there are many settings and configurations in many levels need to be had correctly in order to reproduce such results but Intel should try its best to help people to do so easily and quickly (look at how long we have gone through?)

 

Thank you for your support and best regards

JCK

0 Kudos
JCK1
Beginner
627 Views

And we haven't touched Intel test case #1 for QAT scheduler PMD case. the discrepancy of results is even much bigger.

0 Kudos
Ronny_G_Intel
Moderator
592 Views

Hi JCK1,


I acknowledge your remarks and will convey your feedback to the DPDK team. I also concur that this matter has been ongoing for quite some time, and unfortunately, the outcomes have not met your expectations. I apologize for any inconvenience this may have caused.

I would agree that the outcomes for test #1 involving the QAT scheduler PMD might differ from the official DPDK performance test results for the same reasons I previously outlined.


Let me know how do want to proceed with this issue. I don't really have additional recommendations at this point.


Regards,

Ronny G



0 Kudos
JCK1
Beginner
590 Views

 

Hi Ronny

 

I am not sure how much more Intel can do at this point based on what we can achieve, I would accept the current situation as is and convey to our mgmt and partner who wants to use QAT in their product.  It is up to them to make final decision.

 

thank you

JCK

0 Kudos
JCK1
Beginner
410 Views

Hi Ronny

 

Thanks for your support on this matter. One last favor I would ask is to have your DPDK expert  run all remaining tests (like qat scheduler PMD) on their existing setup to see how much we can get out of it. I like to include these results in my final report to my management as a reference point. Could you help on this?

 

Thanks
JCK

0 Kudos
Ronny_G_Intel
Moderator
445 Views

Hi JCK1,

 

I have conveyed your feedback to the DPDK team for their consideration, and we will be exploring ways to make it easier for customers to replicate the outcomes detailed in the Performance Reports. Achieving results that are extremely close to ours can be challenging, as we cannot ensure identical outcomes if the system differs. It's important to note that only systems that are exactly the same can produce results that are very close to those reported, and even then, they may not be exactly identical.

I regret any inconvenience this may have caused and the delay in resolving this matter. Please inform me if there's anything more I can assist you with. If there are no further issues, I will proceed to close the internal ticket I have opened regarding this concern.

 

Regards,

Ronny G

 

0 Kudos
Ronny_G_Intel
Moderator
357 Views

Hi JCK1,


I am concluding the internal case I opened for this matter. Please don't hesitate to initiate a new thread if you require further help.


Thanks,

Ronny G


0 Kudos
Reply