Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
41 Views

Cache and _mm_prefetch

Jump to solution

Hello,

I have some code, where I iterate over an array in reverse order. I already use SSE,AVX (depending on what CPU supports). Normally prefetching of CPU should be finde, if I iterate over an arry from begin to end. But what about end to begin, so reverse? Does the CPU realize this pattern?

Or should I give hints, with _mm_prefetch? If so, how do I use this intrinsic. Should I always give L1 as cache level. And how many iterations before should I prefetch data?

0 Kudos

Accepted Solutions
Highlighted
Black Belt
41 Views

The last CPU I'm aware of which didn't have backwards hardware prefetch was Athlon. Usually there is ability to capture a few backwards streams in addition to the larger number available for forward streams.

as far as I know the situation for hints would be similar in either direction. Many cpu will ignore l1 hints, this may be covered in architecture doc.

View solution in original post

0 Kudos
5 Replies
Highlighted
Black Belt
42 Views

The last CPU I'm aware of which didn't have backwards hardware prefetch was Athlon. Usually there is ability to capture a few backwards streams in addition to the larger number available for forward streams.

as far as I know the situation for hints would be similar in either direction. Many cpu will ignore l1 hints, this may be covered in architecture doc.

View solution in original post

0 Kudos
Highlighted
Beginner
41 Views

Hello,

ok if the CPU has at least on backward prefetcher thats fine for me. Profiling seems to confirm this. If I change indexing to forward, performance changes 1-2%, this might be normal uncertainty of tests.

I did some research and you are right the prefetch instruction does not seem to be used anymore.

0 Kudos
Highlighted
Black Belt
41 Views

>>>But what about end to begin, so reverse? Does the CPU realize this pattern?>>>

It is already answered, but  I would like to add that probably in order to realize backward prefetching index pattern should be linear and decresing.

0 Kudos
Highlighted
Beginner
41 Views

iliyapolak wrote:

>>>But what about end to begin, so reverse? Does the CPU realize this pattern?>>>

It is already answered, but  I would like to add that probably in order to realize backward prefetching index pattern should be linear and decresing.

I am sure, this requirement is fullfilled: I have a loop starting from maxsize to zero and decrement is vector size, so 128bit for sse version and 256bit for avx version. So this is a fixed step linear decrement.

0 Kudos
Highlighted
New Contributor I
41 Views

Christian,

You can find the answer In "7.2 HARDWARE PREFETCHING OF DATA"
of Intel® 64 and IA-32 Architectures Optimization Reference Manual (google "Order Number: 248966").

I suspect you have at least 4 backward stream prefetchers.

0 Kudos