Hi Commander Lake,
You could use _mm256_mpsadbw_epu8 to find the sum of absolute difference of 8 quadruplets and use a _mm256_testz_si256 instruction to see if all the elements are zero or not, if it is nonzero , the strings are not the same.
However in my experience, these types of code totally depends on memory access performance, try to use the memory streaming operations for better performance ( I did achieve some performance boost this way). Secondly visual studio 2017 compiler is very efficient at vectorizing this type of code so you could check the disassembly of your non vectorized code, may be it is already vectorizing for you.
You could also try the string comparison instructions given in the intrinsics guide but I do not know how they perform and remember they have only 128 bit variants.
Edit: I realize you could just read 256 bit integers and compare them instead of using those Sum of Absolute value instructions.
The comparison and zero checking instructions together take around 7-10 cycles whereas the string compare instructions take 10-15 cycles even while working on 128 bits of data as seen in the instruction tables in