Hello,I've run standard IPP

HSing52 · ‎04-17-2020

Dear Sir,/Madam,

I understand that MD5 is no longer the strongest algorithm and has know attacks. However i still want to use it, however i have find below comments in documentation :

"This algorithm is considered weak due to known attacks on it. The functionality remains in the library, but the implementation will no longer be optimized and no security patches will be applied."

Does that mean MD5 version is not optimized anymore ? Is that mean optimizations has been removed ?

I am using this version: l_ippcp_2019.0.117

I have benchmark Intel MD5 version with non-intel un-optimized version and it gives same performance.

Pls suggest.

Gennady_F_Intel · ‎04-20-2020

That means the existing optimizations will still keep in this algorithm, but IPP team will not add the nest optimizations for the next version of the hardware.

HSing52 · ‎04-20-2020

Dear Sir,

Thank you so much for kind reply and taking time, i hope you and your family are safe and sound.

I find that the IPP crypto md5 not giving me any performance boost, i have a regular un-optimized md5 c++ code when i compare the performance between IPP crypto md5 and regular un-optimized md5 c++ code. I find that regular un-optimized md5 c++ code slightly outperformed IPP crypto md5 by 20-30 nanoseconds. I was expecting to get significant boost to get from IPP.

Some reference points :

1. I have only installed IPPCP addon and not installed IPP , will that make any difference ?

2. I am using g++ compiler on Fedora linux.

3. Hardware : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz, AVX2 enable

4. I have tried with below both IPP version: compilers_and_libraries_2019.5.281,compilers_and_libraries_2019.0.117

5. Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz

==================

ppCP AVX2 (l9) 2019.0.0 (r0x438f167)

Features supported by CPU by Intel(R) Integrated Performance Primitives Cryptography
-----------------------------------------
ippCPUID_MMX = Y Y Intel(R) Architecture MMX technology supported
ippCPUID_SSE = Y Y Intel(R) Streaming SIMD Extensions
ippCPUID_SSE2 = Y Y Intel(R) Streaming SIMD Extensions 2
ippCPUID_SSE3 = Y Y Intel(R) Streaming SIMD Extensions 3
ippCPUID_SSSE3 = Y Y Supplemental Streaming SIMD Extensions 3
ippCPUID_MOVBE = Y Y The processor supports MOVBE instruction
ippCPUID_SSE41 = Y Y Intel(R) Streaming SIMD Extensions 4.1
ippCPUID_SSE42 = Y Y Intel(R) Streaming SIMD Extensions 4.2
ippCPUID_AVX = Y Y Intel(R) Advanced Vector Extensions (Intel(R) AVX) instruction set
ippAVX_ENABLEDBYOS = Y Y The operating system supports Intel(R) AVX
ippCPUID_AES = Y Y Intel(R) AES instruction
ippCPUID_SHA = N N Intel(R) SHA new instructions
ippCPUID_CLMUL = Y Y PCLMULQDQ instruction
ippCPUID_RDRAND = Y Y Read Random Number instructions
ippCPUID_F16C = Y Y Float16 instructions
ippCPUID_AVX2 = Y Y Intel(R) Advanced Vector Extensions 2 instruction set
ippCPUID_AVX512F = N N Intel(R) Advanced Vector Extensions 512 Foundation instruction set
ippCPUID_AVX512CD = N N Intel(R) Advanced Vector Extensions 512 Conflict Detection instruction set
ippCPUID_AVX512ER = N N Intel(R) Advanced Vector Extensions 512 Exponential & Reciprocal instruction set
ippCPUID_ADCOX = Y Y ADCX and ADOX instructions
ippCPUID_RDSEED = Y Y The RDSEED instruction
ippCPUID_PREFETCHW = Y Y The PREFETCHW instruction
ippCPUID_KNC = N N Intel(R) Xeon Phi(TM) Coprocessor instruction set

Code i wrote to use Intel IPP MD5:

1. using ippsMD5MessageDigest.

static Ipp8u digest[32];
ippsMD5MessageDigest( (const Ipp8u *)data1, size , digest);

2. Using ippsHashMessage.

static Ipp8u digest2[16];
ippsHashMessage( (const Ipp8u *)data1, size , digest2, IPP_ALG_HASH_MD5);

3. Using ippsMD5Update & ippsMD5Final:

However no such difference in performance between 1 & 2. 3 took 100 nanosecond more then 1 & 2.

I assume the code will do automatic cpu dispatching and i dont have to explictly intialize anything since code will initialize during first call.

Compile & Linking: Compiled with dynamic libs.

Linked with all architecture specific libs : -lippcp -lippcpe9 -lippcpk0 -lippcpl9 -lippcpm7 -lippcpn0 -lippcpn8 -lippcpy8

g++ -O3 compare_md5.cpp md5.cpp -omit-frame-pointe -mavx2 -o compare_md5 -I /home/user9/intel/compilers_and_libraries_2019.5.281/linux/ippcp/include/ -L /home/user9/intel/compilers_and_libraries_2019.5.281/linux/ippcp/lib/intel64 -lippcp -lippcpe9 -lippcpk0 -lippcpl9 -lippcpm7 -lippcpn0 -lippcpn8 -lippcpy8 -pthread

Could you pls guide if i am missing anything ?

Gennady_F_Intel · ‎04-20-2020

>> 1. I have only installed IPPCP addon and not installed IPP , will that make any difference?

<< no, it will not make any difference..

>> " an un-optimized md5 c++ code. "

<< is that some kind of open source code or in-house private ones? if the OpenSource - could you share the link?

>> I forward the questions to the IPP Crypto experts to look at your questions....

HSing52 · ‎04-20-2020

Dear Gennady,

Pls. find link https://www.nayuki.io/page/fast-md5-hash-implementation-in-x86-assembly , i am using the code from this link.

Thank you , looking forward.

Sergey_K_Intel4 · ‎04-21-2020

Hello,
I've run standard IPP build and found that:
ippsMD5MessageDigest - 4.32 cycle/byte
ippHashMesage - 4.33 cycle/byte
ippsMD5MessageDigest_rmf - 4.38 cycle/byte

ippsMD5Update - 4.02 cycle/byte
ippHashUpdate - 4.02 cycle/byte
ippHashUpdate - 4.02 cycle/byte

Data above have been obtained on 1024 length payload, 2.6GHz CPU, "l9" code.
I think, ippcpInit() call should be before 1-st IPP's processing function

Sergey_K_Intel4 · ‎04-21-2020

sorry, for misprint,

ippsMD5Update - 4.02 cycle/byte
ippHashUpdate - 4.02 cycle/byte
ippHashUpdate_rmf - 4.02 cycle/byte

HSing52 · ‎04-21-2020

Dear Sergey,

1. What is _rmf for ?

2. I have used ippcpInit() in code now but performance unchanged. Calling ippcpInit() is required ? I think i have read even if we dont call ippcpInit, during the first call to the API it does the same what ippcpInit() does. Pls. correct me if i am wrong.

3. considering, 4.02 cycle/byte, how many nanosecond should one take for 300 bytes ?

As per your hardware 2.6GHz

1 cycle = 2.6 naoseconds
4.02 cycle/byte .
1 byte = 10.452 nanoseconds
300 byte = 3135.6 nanoseconds. // i am getting 580 nanoseconds for 300 bytes using Intel ippsHashMessage api.

4. Could you pls review if there is any mistake in above calculation ?

5. How should i know cycle / byte at my system, this stats is by using ps_ippcp ? What should be the command line param to get cycle/byte for these md5 apis.

Ruqiu_C_Intel · ‎02-02-2021

What is _rmf for ?

kind of implementation; compare declaration of Hash APIs

I have used ippcpInit() in code now but performance unchanged. Calling ippcpInit() is required ? I think i have read even if we dont call ippcpInit, during the first call to the API it does the same what ippcpInit() does. Pls. correct me if i am wrong.

The latest version of IPP does not required ippcpInit() call. If the call is omitted than dispatching happen by the 1-st call of any IPP function

considering, 4.02 cycle/byte, how many nanosecond should one take for 300 bytes ?

As per your hardware 2.6GHz

1 cycle = 2.6 naoseconds
4.02 cycle/byte .
1 byte = 10.452 nanoseconds
300 byte = 3135.6 nanoseconds. // i am getting 580 nanoseconds for 300 bytes using Intel ippsHashMessage api.

Could you pls review if there is any mistake in above calculation ?

cpu freq=2.6*10^9 Hz, 1s = 10^9 ns => 1ns ~ 2.6 cpu cycles

Perf = 4.02 cycles/byte => 4.02*300~1200cycles -- 300 bytes

1200 cycles => 1200/2.6 ns =~460 ns, but NOT 3135!!

How should i know cycle / byte at my system, this stats is by using ps_ippcp ? What should be the command line param to get cycle/byte for these md5 apis.

MD5 is deprecated functionality, IPP’s perf system not supports it

Cobler__Justin · ‎05-06-2020

It means that Intel will not deliver improved optimizations to the code.

The patching cycle of it will be stopped.

So you shall have to find vulnerabilities of the code and patch them yourself.

Ruqiu_C_Intel · ‎05-11-2021

This algorithm is considered weak due to known attacks on it. The functionality remains in the library, but the implementation will no longer be optimized and no security patches will be applied.

https://software.intel.com/content/www/us/en/develop/documentation/ipp-crypto-reference/top/one-way-hash-primitives/hash-functions-for-non-streaming-messages/md5messagedigest.html

The crypto community does not consider SHA-1 or MD5 algorithms secure anymore.

Recommendation: use a more secure hash algorithm (for example, any algorithm from the SHA-2 family) instead of SHA-1 or MD5.

https://software.intel.com/content/www/us/en/develop/documentation/ipp-crypto-reference/top/one-way-hash-primitives.html

We will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

MD5MessageDigest performance