For example, there are an array A. it’s length is length_A. Using AVX gather(_mm256_i32gather_i32) function to read array A. There are two memory access pattern.
mm256 register = (A, A,….A)
mm256 register = (A, A,….A),,,and so on
stride = length_a /8;
mm256 register = (A, A[stride+0],….A[7*stride+0])
mm256 register = (A, A[stride+1],….A[7*stride+1]),,,and so on
which is better when length_A is very large?
For more complete information about compiler optimizations, see our Optimization Notice.