Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Performace Issue - IPP 8.2.090 Crypto AES_CTR Mode

Deepak_B_
Beginner
873 Views

Hi,

We have integrated with Intel IPP Crypto library version 8.2.090. We are using this library for AES Encrypt / Decrypt in CTR mode (for SRTP). Our target platform is a VMWare system running on Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz. Hence the application is linked with single threaded (st) version of libippcpv8.so.8.2 (optimized for Xenon platfor,) & libippcore.so.8.2.

Note: Our application already has thread pool implemented and hence we do not need to use the multi-threaded (mt) version of the libraries.

We compared the performance of Intel IPP crypto implementation of AES 128 bit CTR mode Encrypt / Decrypt with the aes_icm_ctr implementation in libsrtp (https://github.com/cisco/libsrtp/tree/master/crypto/cipher).

We observe that IPP crypto implementation is **slower** when compared to the open source implementation.  128 bit ippsAESEncryptCTR() takes ~125% of the time it takes for the corresponding implementation in the open source libsrtp code base - aes_icm_encrypt_ismacryp(). And that 128 bit ippsAESDecryptCTR() takes ~175% of the time it takes for the corresponding implementation in the open source libsrtp code base - aes_icm_encrypt_ismacryp().

Do you have any suggestions on how we can get Intell IPP crypto library to perform better on our target platform for AES in CTR mode ?

Below is how the routines are being invoked and these are the only methods that we are attempting to profile.

   #define SRTP_IPP_AES_CTR_BIT_LEN 16

    retStatus = ippsAESEncryptCTR(
            (const Ipp8u*)enc_start,
            (Ipp8u*)enc_start,
            enc_octet_len,
            stream->ipp_rtp_cipher_ctxt,
            ctr,
            SRTP_IPP_AES_CTR_BIT_LEN);

retStatus = ippsAESDecryptCTR(
                (const Ipp8u*)enc_start,
                (Ipp8u*)enc_start,
                enc_octet_len,
                stream->ipp_rtp_cipher_ctxt,
                ctr,
                SRTP_IPP_AES_CTR_BIT_LEN);

 

Thanks.

0 Kudos
6 Replies
Igor_A_Intel
Employee
873 Views

Hi Deepak,

could you provide absolute numbers in cpu clocks per byte? I don't understand what 125 or 175% mean - 1.25x speedup or 2.25x. I'm curious why you see different ratios for encryption and decryption - for CTR mode of AES the same code is working... Also take into account that IPP functions are mitigated from any known sorts of attacks.

regards, Igor.

0 Kudos
Deepak_B_
Beginner
873 Views

Hi Igor,

ippsAESEncryptCTR() takes 3958 usec to encrypt 32000 bytes of payload data while ippsAESDecryptCTR() takes 4888 usec to decrypt 32000 bytes of encrypted data. This is on a platform running VMWare VM configured to use 1 core of Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz.

On the Same platform the open source version (imlementation in libsrt - https://github.com/cisco/libsrtp/tree/master/crypto/cipher) takes 3465 usec to encrypt 32000 bytes of payload and 3867 usec to decrypt  32000 bytes of encrypted data.

Platform:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
stepping        : 4
microcode       : 0x416
cpu MHz         : 2600.000
cache size      : 20480 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc up arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm
bogomips        : 5200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

 

0 Kudos
Igor_A_Intel
Employee
873 Views

Hi Deepak,

did you perform a call to ippInit() function before a call of any other IPP processing function?

regards, Igor

0 Kudos
Igor_A_Intel
Employee
873 Views

Deepak,

below is an output from our performance system (you can find it in the standard installation in the "tools" subfolder - ps_ippcp) - you see, that for single thread and 3.5 GHz AVX CPU it takes 14 usec to encrypt/decrypt the 32K vector, therefore taking into account that you have 2.6 GHz CPU - I expect 14*3.5/2.6=~19-20 usec, but not 4000-5000:

CPU       Processor supporting Advanced Vector Extensions instruction set               4x3.49 GHz        Max cache size 8192 K 

OS          Linux (2.6.32-279.el6.x86_64      x86_64)                          

Computer           nnlmdp312                                    

Library  ippCP AVX (e9)   8.2.1 (r44077)   Oct  9 2014     

Start      Thu Oct  9 18:57:30 2014    

what kind of linking do you use - static or dynamic? what kind of library - single or multi-threaded? do you call ippInit() if static? could you provide an output from ippcpGetLibVersion() function?

regards, Igor

0 Kudos
Igor_A_Intel
Employee
873 Views

attempt #2 to insert the table failed...

0 Kudos
Igor_A_Intel
Employee
873 Views
attempt #3 function Parm1 Parm2 Parm3 Parm4 Parm5 Comment Clocks per Time (usec) ippsAESDecryptCTR 8u - 32768 128 128 nLps=4 1.5 e 14.1 ippsAESDecryptCTR 8u I 32768 128 128 nLps=4 1.5 e 14 ippsAESEncryptCTR 8u - 32768 128 128 nLps=4 1.5 e 14.1 ippsAESEncryptCTR 8u I 32768 128 128 nLps=16 1.5 e 14
0 Kudos
Reply