I am trying to vectorise code that is using mainly integer instructions (add,rol,xor). I cannot get the compiler to vectorise this.
My understanding is there is no vector version of rol. Will this be supported in the future?
I have tried on Westmere, Sandy Bridge and Haswell with both SSE and AVX. In AVX the rol is repalced by shld, but there is no gain.
I seem to be able to get the code to unroll, but no vector instructions are inserted (according to disassembler). There is a slight speedup (~10%), but I believe this is due to better use of multiple ALUs; from more independent instructions.
Any guidance would be welcome.
Note; Using intel16.0 icc - linux - SB/HSW
What size integer? Signed or unsigned? Placement? Flow after rol? Anything other that may be pertinent?
Can you show your code, including representative numbers for loops, array size etc... A complete working (dummied up) example would be good.