Analyzers
Community support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
4819 Discussions

PREFETCHNTA cause L1D eviction (L1D.REPLACMENT)

Alexander_Alexeev
157 Views

Hello, it seems I have some kind of misunderstanding. I am expecting that PREFETCHNTA prefetchs data to 2nd level cache and doesn't evict anything from L1D. But in vTune I can clearly see that in function that contains only prefetchnta (as a microbenchmark) many L1D.REPLACMENT events atributed to every non-temporal prefetch instruction. So it means prefetched data is actualy reach L1D cache, right?

What is wrong in my undertsanding or what did I miss? My intention is process block of data there every piece  is needed only once, so that is why it would be better to avoid bringing it in L1D and use non-temporal operations. 

Any recomendation for SandyBridge and new Intel platrforms?

BTW does non-temporal load to AVX register available in SB (somthing like MOVNTDQA)?

Thanks in advance.

AORM says

" The non-temporal instruction is:  PREFETCHNTA— Fetch the data into the second-level cache, minimizing cache pollution."

and 

L1D.REPLACEMENT - Replacements in the 1st level data cache.

0 Kudos
0 Replies
Reply