Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Cache and _mm_prefetch

Christian_M_2
Beginner
2,521 Views

Hello,

I have some code, where I iterate over an array in reverse order. I already use SSE,AVX (depending on what CPU supports). Normally prefetching of CPU should be finde, if I iterate over an arry from begin to end. But what about end to begin, so reverse? Does the CPU realize this pattern?

Or should I give hints, with _mm_prefetch? If so, how do I use this intrinsic. Should I always give L1 as cache level. And how many iterations before should I prefetch data?

0 Kudos
1 Solution
TimP
Honored Contributor III
2,521 Views

The last CPU I'm aware of which didn't have backwards hardware prefetch was Athlon. Usually there is ability to capture a few backwards streams in addition to the larger number available for forward streams.

as far as I know the situation for hints would be similar in either direction. Many cpu will ignore l1 hints, this may be covered in architecture doc.

View solution in original post

0 Kudos
5 Replies
TimP
Honored Contributor III
2,522 Views

The last CPU I'm aware of which didn't have backwards hardware prefetch was Athlon. Usually there is ability to capture a few backwards streams in addition to the larger number available for forward streams.

as far as I know the situation for hints would be similar in either direction. Many cpu will ignore l1 hints, this may be covered in architecture doc.

0 Kudos
Christian_M_2
Beginner
2,521 Views

Hello,

ok if the CPU has at least on backward prefetcher thats fine for me. Profiling seems to confirm this. If I change indexing to forward, performance changes 1-2%, this might be normal uncertainty of tests.

I did some research and you are right the prefetch instruction does not seem to be used anymore.

0 Kudos
Bernard
Valued Contributor I
2,521 Views

>>>But what about end to begin, so reverse? Does the CPU realize this pattern?>>>

It is already answered, but  I would like to add that probably in order to realize backward prefetching index pattern should be linear and decresing.

0 Kudos
Christian_M_2
Beginner
2,521 Views

iliyapolak wrote:

>>>But what about end to begin, so reverse? Does the CPU realize this pattern?>>>

It is already answered, but  I would like to add that probably in order to realize backward prefetching index pattern should be linear and decresing.

I am sure, this requirement is fullfilled: I have a loop starting from maxsize to zero and decrement is vector size, so 128bit for sse version and 256bit for avx version. So this is a fixed step linear decrement.

0 Kudos
Vladimir_Sedach
New Contributor I
2,521 Views

Christian,

You can find the answer In "7.2 HARDWARE PREFETCHING OF DATA"
of Intel® 64 and IA-32 Architectures Optimization Reference Manual (google "Order Number: 248966").

I suspect you have at least 4 backward stream prefetchers.

0 Kudos
Reply