- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I need to load data with strides in AVX512 vector register. What is the best way to do this.
Lets suppose the stride is of 1000 and I need to load data at index 0,1*1000, 2*1000, 3*1000, 4*1000, 5*1000 , 6*1000 and 7*1000 in one AVX512 vector register.
What is the fastest way to do this. Which intrinsic should be used to do this. Data is double precision floating point numbers.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Could you please try using _mm512_i32gather_pd intrinsic to load the data in AVX512 registers as you can use scale factor for an index vector.
Please refer to below link for more details:
>>What is the fastest way to do this.
To work efficiently, one does not update the indexes, instead, it is better to update the base address to the (next) first of stride to load (this conserves a vector register).
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you. Could you please provide an update on your issue?
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, These commands help me in doing so but the performance is not as I was expecting. The copy of stride data using cilk array is better than using AVX512.
According to intel intrinsic guide the gather latency is high as compared to the load or store. But I am facing performance degradation on store operation as compared to gather.
__m512d _A0 = _mm512_i64gather_pd(vidx , &AS[source_location], 8);
_mm512_storeu_pd(&AD[destination_location], _A0);
These copy commands are with in nested loops which are parallelized using OpenMP. In every iteration location is changed. I observed that store operation take most of the time. Any good way to optimize it?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please provide us with a complete reproducer, so that we can investigate more on your issue?
Thanks & Regards
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you. Could you please provide an update on your issue along with the above-requested details?
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have not heard back from you, so I will close this inquiry now. If you need further assistance, please post a new question.
Thanks & Regards,
Noorjahan.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page