Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Does new Vtune include “LOAD_DISPATCH.ANY” event?

liu__kevin
Beginner
638 Views

Hi,

Does anyone knows the new vtune 2016 have the event “LOAD_DISPATCH.ANY”? I have not found it in the event list while I wanted to add it. I think my cpu support that, and I attached the cpu infomation also.

bogomips    : 6815.88
clflush size    : 64
cache_alignment    : 64
address sizes    : 39 bits physical, 48 bits virtual
power management:

processor    : 7
vendor_id    : GenuineIntel
cpu family    : 6
model        : 94
model name    : Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
stepping    : 3
microcode    : 0x8a
cpu MHz        : 899.937
cache size    : 8192 KB
physical id    : 0
siblings    : 8
core id        : 3
cpu cores    : 4
apicid        : 7
initial apicid    : 7
fpu        : yes
fpu_exception    : yes
cpuid level    : 22
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs        :
bogomips    : 6815.88
clflush size    : 64
cache_alignment    : 64
address sizes    : 39 bits physical, 48 bits virtual
power management:

 

 

Thank you.

0 Kudos
7 Replies
Alexandra_S_Intel
638 Views

Hi, Kevin,

It looks like your CPU is a Skylake, which does not have the LOAD_DISPATCH.ANY event. You can find a list of events for Skylake here: https://software.intel.com/en-us/node/589938

0 Kudos
liu__kevin
Beginner
638 Views

Alexandra S. (Intel) wrote:

Hi, Kevin,

It looks like your CPU is a Skylake, which does not have the LOAD_DISPATCH.ANY event. You can find a list of events for Skylake here: https://software.intel.com/en-us/node/589938

 

Hi Alexandra,

Thank you so much for your reply, so the events depend on the cpu not vtune? but I see the manual of i7, it has that event, I am not clear about this. And I did not find another event counter, which I found it in vtune 2011, but in 2017 it do not have, certainly, machines are different. So, can I still use the event even though it do not have in the list? does there have a method like I can write something or use command to call/use the event which is not in list but I saw in vtune 2011.

Another question, does vtune 2017 and 2011 have same event list? does the list shows different depending on different machine?

Thank you.

0 Kudos
Alexandra_S_Intel
638 Views

Yes, events are tied to the CPU architecture. This is because events are generated by the hardware itself. VTune just receives that information and does neat things with the numbers.

As for "i7," be careful. Not all Core(TM) i7s are the same! There are several generations of them, which use different microarchitectures. https://en.wikipedia.org/wiki/List_of_Intel_Core_i7_microprocessors
Yours is a 6700, so it is a Skylake: https://en.wikipedia.org/wiki/List_of_Intel_Core_i7_microprocessors#Skylake_microarchitecture_.286th_generation.29

Because the events are generated by the hardware, you cannot use an event that is not supported by your CPU - there is physically nothing to count/generate that event. That's why that event reference I linked you to earlier is so helpful; it tells you all the events for a given microarchitecture.
However, if an event is supported by your CPU, but is not "in the list" for a particular analysis type, you should be able to create a custom analysis that tracks that event. Please note that this is not the same thing as when the event is not "in the list" of events for that CPU!

As far as I know, every version of VTune is capable of using all the events of microarchitectures that existed at the time - but it can only work with what the CPU actually gives it. If a CPU is not capable of generating a particular event, VTune can't tell you anything about that event, since it's not getting any information about it from the hardware.
For the record, I don't think it's true the other way around. That is, I don't think older versions of VTune can understand new events that were added in later microarchitectures. So technically, different versions of VTune would have different event lists - but only because more events were added, not because any were removed.

Does that answer your question?

0 Kudos
liu__kevin
Beginner
638 Views

Alexandra S. (Intel) wrote:

Yes, events are tied to the CPU architecture. This is because events are generated by the hardware itself. VTune just receives that information and does neat things with the numbers.

As for "i7," be careful. Not all Core(TM) i7s are the same! There are several generations of them, which use different microarchitectures. https://en.wikipedia.org/wiki/List_of_Intel_Core_i7_microprocessors
Yours is a 6700, so it is a Skylake: https://en.wikipedia.org/wiki/List_of_Intel_Core_i7_microprocessors#Skylake_microarchitecture_.286th_generation.29

Because the events are generated by the hardware, you cannot use an event that is not supported by your CPU - there is physically nothing to count/generate that event. That's why that event reference I linked you to earlier is so helpful; it tells you all the events for a given microarchitecture.
However, if an event is supported by your CPU, but is not "in the list" for a particular analysis type, you should be able to create a custom analysis that tracks that event. Please note that this is not the same thing as when the event is not "in the list" of events for that CPU!

As far as I know, every version of VTune is capable of using all the events of microarchitectures that existed at the time - but it can only work with what the CPU actually gives it. If a CPU is not capable of generating a particular event, VTune can't tell you anything about that event, since it's not getting any information about it from the hardware.
For the record, I don't think it's true the other way around. That is, I don't think older versions of VTune can understand new events that were added in later microarchitectures. So technically, different versions of VTune would have different event lists - but only because more events were added, not because any were removed.

Does that answer your question?

 

HI Alexandra,

Thank you so much for this, I think it is very helpful.

 

0 Kudos
liu__kevin
Beginner
638 Views

Alexandra S. (Intel) wrote:

Yes, events are tied to the CPU architecture. This is because events are generated by the hardware itself. VTune just receives that information and does neat things with the numbers.

As for "i7," be careful. Not all Core(TM) i7s are the same! There are several generations of them, which use different microarchitectures. https://en.wikipedia.org/wiki/List_of_Intel_Core_i7_microprocessors
Yours is a 6700, so it is a Skylake: https://en.wikipedia.org/wiki/List_of_Intel_Core_i7_microprocessors#Skylake_microarchitecture_.286th_generation.29

Because the events are generated by the hardware, you cannot use an event that is not supported by your CPU - there is physically nothing to count/generate that event. That's why that event reference I linked you to earlier is so helpful; it tells you all the events for a given microarchitecture.
However, if an event is supported by your CPU, but is not "in the list" for a particular analysis type, you should be able to create a custom analysis that tracks that event. Please note that this is not the same thing as when the event is not "in the list" of events for that CPU!

As far as I know, every version of VTune is capable of using all the events of microarchitectures that existed at the time - but it can only work with what the CPU actually gives it. If a CPU is not capable of generating a particular event, VTune can't tell you anything about that event, since it's not getting any information about it from the hardware.
For the record, I don't think it's true the other way around. That is, I don't think older versions of VTune can understand new events that were added in later microarchitectures. So technically, different versions of VTune would have different event lists - but only because more events were added, not because any were removed.

Does that answer your question?

 

Hi Alexandra,
 
I am testing i7-6700 with Vtune 2016 with SPEC 2006. From the definitions, I believe the following should satisfy:
 
MEM_INST_RETIRED.ALL_LOADS = MEM_LOAD_RETIRED.L1_HIT + MEM_LOAD_RETIRED.L1_MISS
 
MEM_INST_RETIRED.ALL_LOADS
All retired load instructions.
 
MEM_LOAD_RETIRED.L1_HIT
Retired load instructions with L1 cache hits as data sources.
 
MEM_LOAD_RETIRED.L1_MISS
Retired load instructions missed L1 cache as data sources
 
However, for some programs (LIBQUANTUM and MCF) where L1D cache miss rate is high, the three numbers are listed as follows.
 
(MEM_INST_RETIRED.ALL_LOADS,MEM_LOAD_RETIRED.L1_HIT,MEM_LOAD_RETIRED.L1_MISS )
LIBQUANTUM : 2.47E+10, 1.40E+10, 2.67E+09, respectively.
MCF  1.15E+11, 7.57E+10, 2.43E+10, respectively.
 
You see there are gaps for these programs. Can you please explain?
 
Thank you.
    
     
     
     
     

 

 

0 Kudos
Alexandra_S_Intel
638 Views

Hello, Kevin,

I apologize for the delay; I've been busy.

It seems likely that the missing factor here is MEM_LOAD_RETIRED.FB_HIT.
Sometimes loads miss L1 but hit FB due to a preceding miss in the same cache line. As I understand it, these are not counted in the MEM_LOAD_RETIRED.L1_MISS event counter. Instead they are recorded in MEM_LOAD_RETIRED.FB_HIT.

So your equation should look like so, approximately:
MEM_INST_RETIRED.ALL_LOADS = MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.L1_HIT + MEM_LOAD_RETIRED.FB_HIT

That said, this will still not be exact. You can expect a small, statistically insignificant difference in results, due to the way event counting works.
Multiplexing and sampling, among other things, can cause you to miss a few events. It's a complicated topic but basically, if we just took note of every event that occurred, it would slow everything down to the point that collecting useful data would be impossible. Instead we have a few hardware counters that count a certain number of events, and every once in a while we take note of how many events occurred in that time block. Making things even more complicated, there are only a few hardware counters, so we have to have them switch out between different events to count. You can get more information here: https://software.intel.com/en-us/articles/understanding-how-general-exploration-works-in-intel-vtune-amplifier-xe
There are numerous other things that can cause very small differences, and I won't list them all - I don't even claim to know them all, because there are a lot of them, and as I said, they produce such small, insignificant mismatches that they can be more or less ignored.

Does this answer your question?

0 Kudos
liu__kevin
Beginner
638 Views

Alexandra S. (Intel) wrote:

Hello, Kevin,

I apologize for the delay; I've been busy.

It seems likely that the missing factor here is MEM_LOAD_RETIRED.FB_HIT.
Sometimes loads miss L1 but hit FB due to a preceding miss in the same cache line. As I understand it, these are not counted in the MEM_LOAD_RETIRED.L1_MISS event counter. Instead they are recorded in MEM_LOAD_RETIRED.FB_HIT.

So your equation should look like so, approximately:
MEM_INST_RETIRED.ALL_LOADS = MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.L1_HIT + MEM_LOAD_RETIRED.FB_HIT

That said, this will still not be exact. You can expect a small, statistically insignificant difference in results, due to the way event counting works.
Multiplexing and sampling, among other things, can cause you to miss a few events. It's a complicated topic but basically, if we just took note of every event that occurred, it would slow everything down to the point that collecting useful data would be impossible. Instead we have a few hardware counters that count a certain number of events, and every once in a while we take note of how many events occurred in that time block. Making things even more complicated, there are only a few hardware counters, so we have to have them switch out between different events to count. You can get more information here: https://software.intel.com/en-us/articles/understanding-how-general-exploration-works-in-intel-vtune-amplifier-xe
There are numerous other things that can cause very small differences, and I won't list them all - I don't even claim to know them all, because there are a lot of them, and as I said, they produce such small, insignificant mismatches that they can be more or less ignored.

Does this answer your question?

 

Hi Alexandra,

Thank you so much for your reply, it helps so much.

I will change the configuration to see the result.

Thank you for your time.

Have a nice day.

 

0 Kudos
Reply