- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have some code, where I iterate over an array in reverse order. I already use SSE,AVX (depending on what CPU supports). Normally prefetching of CPU should be finde, if I iterate over an arry from begin to end. But what about end to begin, so reverse? Does the CPU realize this pattern?
Or should I give hints, with _mm_prefetch? If so, how do I use this intrinsic. Should I always give L1 as cache level. And how many iterations before should I prefetch data?
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last CPU I'm aware of which didn't have backwards hardware prefetch was Athlon. Usually there is ability to capture a few backwards streams in addition to the larger number available for forward streams.
as far as I know the situation for hints would be similar in either direction. Many cpu will ignore l1 hints, this may be covered in architecture doc.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last CPU I'm aware of which didn't have backwards hardware prefetch was Athlon. Usually there is ability to capture a few backwards streams in addition to the larger number available for forward streams.
as far as I know the situation for hints would be similar in either direction. Many cpu will ignore l1 hints, this may be covered in architecture doc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
ok if the CPU has at least on backward prefetcher thats fine for me. Profiling seems to confirm this. If I change indexing to forward, performance changes 1-2%, this might be normal uncertainty of tests.
I did some research and you are right the prefetch instruction does not seem to be used anymore.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>But what about end to begin, so reverse? Does the CPU realize this pattern?>>>
It is already answered, but I would like to add that probably in order to realize backward prefetching index pattern should be linear and decresing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak wrote:
>>>But what about end to begin, so reverse? Does the CPU realize this pattern?>>>
It is already answered, but I would like to add that probably in order to realize backward prefetching index pattern should be linear and decresing.
I am sure, this requirement is fullfilled: I have a loop starting from maxsize to zero and decrement is vector size, so 128bit for sse version and 256bit for avx version. So this is a fixed step linear decrement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Christian,
You can find the answer In "7.2 HARDWARE PREFETCHING OF DATA"
of Intel® 64 and IA-32 Architectures Optimization Reference Manual (google "Order Number: 248966").
I suspect you have at least 4 backward stream prefetchers.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page