Will a vector version of rol be supported in the future



I am trying to vectorise code that is using mainly integer instructions (add,rol,xor). I cannot get the compiler to vectorise this.

My understanding is there is no vector version of rol. Will this be supported in the future?

I have tried on Westmere, Sandy Bridge and Haswell with both SSE and AVX. In AVX the rol is repalced by shld, but there is no gain.

I seem to be able to get the code to unroll, but no vector instructions are inserted (according to disassembler). There is a slight speedup (~10%), but I believe this is due to better use of multiple ALUs; from more independent instructions.

Any guidance would be welcome.


Note; Using  intel16.0 icc - linux - SB/HSW

What size integer? Signed or unsigned? Placement? Flow after rol? Anything other that may be pertinent?

Can you show your code, including representative numbers for loops, array size etc... A complete working (dummied up) example would be good.

Jim Dempsey

