- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was testing IPP SMS4 functions on a Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz.
ippsSMS4EncryptCBC takes 1.3s to encrypt 100MB data, while ippsSMS4DecryptCBC taking only 0.25s to decrypt the cipher.
SMS4 is a symmetric encryption, why encrypting is much slower than decrypting in IPP crypto?
The source file is compiled with gcc not icc, does it matter?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi,
CBC decryption has no feedback dependency, while CBC encryption has.
This feature allows perform decryption of several blocks simultaneously.
This feature is general for CBC mode.
If one compare AES-CBC encryption and decryption the general picture will look the same – decryption is several times faster.
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi zhongqiang,
which IPP version do you use? (+operating system? arch? (ia32 or Intel64), linking - static or dynamic?)
the best reply is to provide an output from ippcpGetLibVersion():
const IppLibraryVersion* lib;
lib = ippcpGetLibVersion();
printf("%s %s %d.%d.%d.%d\n", lib->Name, lib->Version, lib->major, lib->minor, lib->majorBuild, lib->build);
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Igor
The output is:
ippCP AVX (e9) 2018.0.1 (r57267) 2018.0.1.57267
My OS is
Linux algo 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
The linking arguments are:
cc -I/opt/intel/ippcp/include -O3 -c -o sm4test.o sm4test.c
cc -I/opt/intel/ippcp/include -O3 -g sm4test.o /opt/intel/ippcp/lib/intel64/libippcp.a -o test
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Just check the sm4test.c file in "SM4_CBC.7z", it does not include any IPP call. Is there anything missed there?
Also, could you submit your Could you summit a support ticket to our support site: https://www.intel.com/supporttickets? Our support team can reproduce with your test code for the investigation. Here are some steps: https://software.intel.com/sites/default/files/managed/97/ce/SubmittingS...
Thanks,
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Chao Y,
My support ticket is 03236416.
"SM4_CBC.7z" is the code I downloaded from the Internet and did some slight modifications.
The sample code (my code and ipp code) has been uploaded to the support site. Many thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Igor Astakhov (Intel) wrote:
hi,
CBC decryption has no feedback dependency, while CBC encryption has.
This feature allows perform decryption of several blocks simultaneously.
This feature is general for CBC mode.
If one compare AES-CBC encryption and decryption the general picture will look the same – decryption is several times faster
regards, Igor
Thank you for your reply.
I downloaded the SM4 source code from the internet and did some modifications. The code takes 0.88s to encrypt 100MB data in Intel Xeon E3-1230.
I would like to utilize IPP Crypto to optimize the SM4, but found that IPP is a lot slower. I was wondering if there is a high-throughput (> 400MBps in E3-1230) SM4 encryption in IPP crypto?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi Zhongqiang,
the best performance is not the only criterion for crypto functionality. The main criterion in addition to performance is that all IPP crypto functions are safe and mitigated from all known attacks (in ~2005 was published cache-timing attack with cache-line-size granularity, in 2017 - with 16-bit granularity (MemJam)). You implementation is well known - with pre-calculated big tables - it is not safe against the 1st kind and all further attacks.
reading from
uint32_t Sbox_final0_rest[256]
uint32_t Sbox_final1_rest[256]
uint32_t Sbox_final2_rest[256]
uint32_t Sbox_final3_rest[256]
directly depends on the round key and is not regular through your tables - therefore the round key can be easily restored by cache-timing attack and you know - secret and round key are mutually reversible. Please take a look at the attached doc.
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Igor
Did you mean the non-linear substitution should not be implemented as a fixed lookup table for security reasons?
However, I found that SMS4_Sbox (the original Sbox table, in type uint32_t [256]) is defined in the IPP crypto according to the disassembly information of ippsSMS4EncryptCBC.
Sbox_final_res is almost equivalent to SMS4_Sbox which also depends on the round key for reading, so IPP crypto functions are note safe either?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a) exactly, the best way is avoid lookup operations
b) latest IPP implementation of SM4 is using Sbox (is AES-NI are disables), but provide uniform access to SM4 S-box does not dependent from particular input index.
c) IPP implementation of SM4 does not contains large S-boxes, It uses "standard" SM4 256-byte short S-box
const __ALIGN64 Ipp8u SMS4_Sbox[16*16] = {
0xD6,0x90,0xE9,0xFE,0xCC,0xE1,0x3D,0xB7,0x16,0xB6,0x14,0xC2,0x28,0xFB,0x2C,0x05,
0x2B,0x67,0x9A,0x76,0x2A,0xBE,0x04,0xC3,0xAA,0x44,0x13,0x26,0x49,0x86,0x06,0x99,
0x9C,0x42,0x50,0xF4,0x91,0xEF,0x98,0x7A,0x33,0x54,0x0B,0x43,0xED,0xCF,0xAC,0x62,
0xE4,0xB3,0x1C,0xA9,0xC9,0x08,0xE8,0x95,0x80,0xDF,0x94,0xFA,0x75,0x8F,0x3F,0xA6,
0x47,0x07,0xA7,0xFC,0xF3,0x73,0x17,0xBA,0x83,0x59,0x3C,0x19,0xE6,0x85,0x4F,0xA8,
0x68,0x6B,0x81,0xB2,0x71,0x64,0xDA,0x8B,0xF8,0xEB,0x0F,0x4B,0x70,0x56,0x9D,0x35,
0x1E,0x24,0x0E,0x5E,0x63,0x58,0xD1,0xA2,0x25,0x22,0x7C,0x3B,0x01,0x21,0x78,0x87,
0xD4,0x00,0x46,0x57,0x9F,0xD3,0x27,0x52,0x4C,0x36,0x02,0xE7,0xA0,0xC4,0xC8,0x9E,
0xEA,0xBF,0x8A,0xD2,0x40,0xC7,0x38,0xB5,0xA3,0xF7,0xF2,0xCE,0xF9,0x61,0x15,0xA1,
0xE0,0xAE,0x5D,0xA4,0x9B,0x34,0x1A,0x55,0xAD,0x93,0x32,0x30,0xF5,0x8C,0xB1,0xE3,
0x1D,0xF6,0xE2,0x2E,0x82,0x66,0xCA,0x60,0xC0,0x29,0x23,0xAB,0x0D,0x53,0x4E,0x6F,
0xD5,0xDB,0x37,0x45,0xDE,0xFD,0x8E,0x2F,0x03,0xFF,0x6A,0x72,0x6D,0x6C,0x5B,0x51,
0x8D,0x1B,0xAF,0x92,0xBB,0xDD,0xBC,0x7F,0x11,0xD9,0x5C,0x41,0x1F,0x10,0x5A,0xD8,
0x0A,0xC1,0x31,0x88,0xA5,0xCD,0x7B,0xBD,0x2D,0x74,0xD0,0x12,0xB8,0xE5,0xB4,0xB0,
0x89,0x69,0x97,0x4A,0x0C,0x96,0x77,0x7E,0x65,0xB9,0xF1,0x09,0xC5,0x6E,0xC6,0x84,
0x18,0xF0,0x7D,0xEC,0x3A,0xDC,0x4D,0x20,0x79,0xEE,0x5F,0x3E,0xD7,0xCB,0x39,0x48
};
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure it's possible. Let convert your requirement (400MB/s, 3.3GHz) into another units. It corresponds to 3.3e^9/400*1e^6 = 8 cycles/byte. It's your goal.
Imagine you have AES128-CBC cipher instead of SM4-CBC. What performance do you expect from AES128-CBC encryption based on AES-NI implementation? Suppose it will about 3-4 cycles/byte. (Recall, that CBC encryption allows block-by-block processing only).
Both AES and SM4 have 16-byte block. But AES128 takes 11 rounds per block encryption whereas SM4 takes 32. From my point of view this means that SM4-CBC encryption could not show performance better than 3*(32/11)=9 cycles/byte. This estimation based on assumption that both AES and SM4 have similar efficient implementation (== directly mapped into AES-NI). But unfortunately it is not true. AES-NI have been designed for AES implementation specifically, not for SM4. In spite of AES-NI applicable for SM4 performance improvement (recall IPP SM4-CBC decryption shows 0.25s per 100MB) it can't change the situation dramatically.
That is why I think that SM4-CBC encryption at 400MB/s on 3.3GHz CPU is not real
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kirillov (Intel) wrote:
Not sure it's possible. Let convert your requirement (400MB/s, 3.3GHz) into another units. It corresponds to 3.3e^9/400*1e^6 = 8 cycles/byte. It's your goal.
Imagine you have AES128-CBC cipher instead of SM4-CBC. What performance do you expect from AES128-CBC encryption based on AES-NI implementation? Suppose it will about 3-4 cycles/byte. (Recall, that CBC encryption allows block-by-block processing only).
Both AES and SM4 have 16-byte block. But AES128 takes 11 rounds per block encryption whereas SM4 takes 32. From my point of view this means that SM4-CBC encryption could not show performance better than 3*(32/11)=9 cycles/byte. This estimation based on assumption that both AES and SM4 have similar efficient implementation (== directly mapped into AES-NI). But unfortunately it is not true. AES-NI have been designed for AES implementation specifically, not for SM4. In spite of AES-NI applicable for SM4 performance improvement (recall IPP SM4-CBC decryption shows 0.25s per 100MB) it can't change the situation dramatically.
That is why I think that SM4-CBC encryption at 400MB/s on 3.3GHz CPU is not real
You may be right. My idea is to shorten the critical dependency path of SM4 CBC but got no progress.
Anyway, thank you all for your patience and help. I've learned something from this post.
I'll cancel the support request and get in touch with you guys if I have any further question :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page