I tried the following simple for-loop with data dependency,
#pragma omp simd
for (i = 1; i < 256; ++i) a[i] = 3.125 * a[i-1];
Using icc with the options (-xCORE-AVX512 -qopt-zmm-usage=high -qopenmp-simd) on Skylake-SP CPU, it seems this for-loop can be vectorized, because instructions vmovups and vmulps are used for data read/write and multiplication, respectively.
Therefore vectorization may still be possible for some loops with data dependency. Am I correct?
Thank you in advance!
I found the problem.
The compiler may generate vectorized instructions (e.g. vmovups and vmulps) for loop with data dependency, but the calculated numerical results are complete wrong.