- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The documentation at http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-254C3F9D-5DDD-4B27-95E2-B6986B4A852B.htm indicates that "Only the lower eight elements are used as indices. The upper eight elements are not used." Since this is a single-precision gather, shouldn't all 16 elements be used as indices? Is this a documentation error, or does this pretefch really only operate on half of the elements? (Perhaps the prefetch unit is limited to 8 addresses?)
What is the purpose of the conv argument to the prefetch instructions? Presumably the data isn't actually being converted yet. Is this just a hint about how many bytes will be read from each address?
The instruction is documented to prefetch a float32 vector. I assume that it's equally effective to prefetch an int32 vector (or, in fact, a number of int32s which will be read using legacy x86 instructions). Can someone please confirm this?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I inquired w/Development about your questions.
The statement cited from the User Guide is a documentation error. Apparently a mistaken copy-n-paste from a 64-bit indices variant, such as _mm512_i32lo[ext]gather_pd. I notified our Documentation team about this (internal tracking id below) and will update this post once corrected.
Regarding conv, they said "yes, it is a hint about how many bytes will be read from each address."
Regarding prefetching an int32, they concur, "I believe this is true - it's equally effective to prefetch an int32 vector."
(Internal tracking id: DPD200248812)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peter and Kevin,
Sorry to bump this thread after so long but I had a related question that doesn't seem to be addressed elsewhere on the forum.
Can the _mm512_prefetch_i32[ext]gather_ps intrinsics be used to prefetch doubles?
My understanding was that each index would prefetch at least one 64 byte cache line, is that correct?
E.g. if I want to prefetch doubles at indices {0,1,2,3,100,101,102,103 etc..} would I need to create an index vector containing each 32 bit portion of the double or is sufficient to prefetch each unique cache line?
I am trying to prefetch each unique cache line at the moment (by doing a modulus operation on the gather indices and scaling appropriately) without success, the performance of the sparse matrix operation is actually degrading.
I can't find any reference elsewhere on how to properly use the prefetch gather intrinsics on 64 bit types.
Best regards,
Alastair

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page