Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

General Exploration--LLC

Asimina_M_
Beginner
389 Views

Hi,

I would please like some help on the following. I am doing a general exploration experiment on a small test case run on 24 cores on liniux. While in the "hardware issues view I see CPI being  over 1" and quite high on some particular threads(2.57etc)    LLC, Contested Acceess, Branch prediction and data sharing  shows to be  0. (As each thread is running on an array (size number of threads) I would expect to see data sharing actually.   Instead I see nothing, so no explanation why CPI is high. My run is quite small. Is it possible I am not hitting a hardware count limit and I see nothing? Is it possible for me to adjust this?

Thanks

 

 

 

 

0 Kudos
2 Replies
Peter_W_Intel
Employee
389 Views

 ...I see CPI being  over 1" and quite high on some particular threads(2.57etc)    LLC, Contested Acceess, Branch prediction and data sharing  shows to be  0.

My impression has two possible reasons:

1. Did you use SSE/AVE instructions? It should cause CPI value >1, because of SIMD

2. Did you have IO wait or threads' stalling/suspending? You can use Locksandwaits analysis to inspect. 

Hopefully you can share result directory if it is not sensitive. 

0 Kudos
TimP
Honored Contributor III
389 Views

As Peter hinted, CPI such as you quoted is normal for efficiently vectorized code.  It doesn't make sense to sacrifice simd performance or emphasize spin wait loops for the sake of a lower CPI.

The type of cache sharing you would want to avoid is the one where write misses hit in cache (of other cores) which is characteristic of false sharing.  You would also want to keep threads local to one CPU as much as possible, to minimize duplicating cache lines on both CPUs.   In many cases, normal application of thread affinity is sufficient to keep these from posing difficulties.

0 Kudos
Reply