Hi everyone,
I tried the following simple for-loop with data dependency,
#pragma omp simd
for (i = 1; i < 256; ++i) a[i] = 3.125 * a[i-1];
Using icc with the options (-xCORE-AVX512 -qopt-zmm-usage=high -qopenmp-simd) on Skylake-SP CPU, it seems this for-loop can be vectorized, because instructions vmovups and vmulps are used for data read/write and multiplication, respectively.
Therefore vectorization may still be possible for some loops with data dependency. Am I correct?
Thank you in advance!
I found the problem.
The compiler may generate vectorized instructions (e.g. vmovups and vmulps) for loop with data dependency, but the calculated numerical results are complete wrong.
For more complete information about compiler optimizations, see our Optimization Notice.