- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For example, there are an array A. it’s length is length_A. Using AVX gather(_mm256_i32gather_i32) function to read array A. There are two memory access pattern.
1.
mm256 register = (A[0], A[1],….A[7])
mm256 register = (A[8], A[9],….A[15]),,,and so on
2.
stride = length_a /8;
mm256 register = (A[0], A[stride+0],….A[7*stride+0])
mm256 register = (A[1], A[stride+1],….A[7*stride+1]),,,and so on
which is better when length_A is very large?
- Tags:
- Parallel Computing
Link Copied
0 Replies

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page