Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Check arrays for equality with SIMD

DLake1
New Contributor I
1,152 Views

What's the fastest way to check 2 unsigned char arrays of indeterminate size for equality in C++?

I'm using Visual Studio 2017 and Intel Compiler 2017.

0 Kudos
4 Replies
TimP
Honored Contributor III
1,152 Views
This seems to be a case for Intel c++ memcmp()
0 Kudos
DLake1
New Contributor I
1,152 Views

Thanks, how am I supposed to know what header to include for these things?

0 Kudos
TimP
Honored Contributor III
1,152 Views
Intel c++ or c will check its own built-in and cstring or string.h (according to your #include) before dropping down to the vs or Linux headers. If my crude description is misleading, I hope a compiler expert will weigh in.
0 Kudos
Anil_M_
Beginner
1,152 Views

Hi Commander Lake, 

You could use _mm256_mpsadbw_epu8  to find the sum of absolute difference of 8 quadruplets and use a _mm256_testz_si256  instruction to see if all the elements are zero or not, if it is nonzero , the strings are not the same. 

However in my experience, these types of code totally depends on memory access performance, try to use the memory streaming operations for better performance ( I did achieve some performance boost this way). Secondly visual studio 2017 compiler is very efficient at vectorizing this type of code so you could check the disassembly of your non vectorized code, may be it is already vectorizing for you.  

You could also try the string comparison instructions given in the intrinsics guide but I do not know how they perform  and remember they have only 128 bit variants.

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3643,5519&cats=String%252520Compare.

- Anil

Edit: I realize you could just read 256 bit integers and compare them instead of using those Sum of Absolute value instructions. 

The comparison and zero checking instructions together take around 7-10 cycles whereas the string compare instructions take 10-15 cycles even while working on 128 bits of data as seen in the instruction tables in 

https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
 

0 Kudos
Reply