Showing results for 
Search instead for 
Did you mean: 

why different assigned SAV are not effecting the hardware event cunt?


I am using intel vtune on ubuntu to collect for example CPU_CLK_UNHALTED.THREAD,INST_RETIRED.ANY hardware counters.
I am using this command 

"/opt/intel/vtune_amplifier_xe_2013/bin64/amplxe-cl -collect-with runsa -knob event-config=CPU_CLK_UNHALTED.THREAD,INST_RETIRED.ANY:sa=100000-target-duration-type=long -knob enable-stack-collection=true -app-working-dir /spec2006/bin -- /spec2006/bin/runspec --config=defaultconfig.cfg --size=ref --noreportable --iterations=1 bzip2"

The result is 

"Operating System          3.8.0-19-generic DISTRIB_ID=Ubuntu

DISTRIB_DESCRIPTION="Ubuntu 13.04"                                
Computer Name             HH-Xeon                                                                                                                                            
Result Size               2898412                                                                                                                                            

Parameter          r005runsa                    
-----------------  -----------------------------
Name               Intel(R) Xeon(R) E5 processor
Frequency          1900000000                   
Logical CPU Count  24                           

Elapsed Time:  680.540

Event summary
Hardware Event Type               Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample
--------------------------------  -------------------------  --------------------------------  -----------------
INST_RETIRED.ANY                              2384309384270                               941  3567587328       
CPU_CLK_UNHALTED.THREAD                       1577945739860                               667  200000000        
Synchronization Context Switches                        142                               128  0                
Preemption Context Switches                             820                               820  0                
Wait Time                                     3777908607648                               128  0                
Inactive Time                                      55432358                               820  0                
Energy Core                                      9557412800                               754  0                
Energy Pack                                     18843398368                               754  0                
Energy DRAM                                      3607192000                               759   "

Now if I change the INST_RETIRED.ANY to 400000, the result remains the same. I was expecting to have the value of INST_RETIRED.ANY equal to what I had assigned in the command line like 100000 / 400000.

Can you please explain me this behavior?

0 Kudos
3 Replies

Hi Maria:

The event count *should* be roughly the same.  The sample count will increase, however.  The events should be happening at approximately the same rate for the same workload.  All you are doing by changing the SAV is affecting how many samples are collected.

Note that decreasing the SAV, which increases the sample rate, will increase the overhead of sampling.  This could skew the results slightly since more samples may be collected in the profiling code rather than your code.

Does that help?


I thought by changing the SAV, I am controlling the number of Instruction retired in a application. If the application has 2384309384270 instruction in total, I can restrict them to execute only 100000 out of 2384309384270 by using this parameter INST_RETIRED.ANY:sa=100000. 

According to your explanation, vtune is not working like this. Can you give me any idea how can I achieve the above mentioned task?


You can't "restrict" the number of instructions retired by the application using VTune Amplifier. :\  The code is going to execute as many instructions as the processor can schedule and execute.  What you can do is determine *when* a sample is taken.  For example, you *could* cause a sample to be taken every 100000 instructions retired, but that sample will record the instruction the processor is executing regardless of what module or thread or function it is executing.  There is no way with the VTune Amplifier for you to guarantee a sample be taken every 100000 instructions retired by *your* application, because the processor is executing instructions in all the code executing on the system.

One option I can think of is to use the PIN toolkit to write your own Pin tool that collects a sample every 100000 instructions - but it would be SLOW!  Basically, you would have to instrument every single instruction and count every instruction.  The overhead would be prohibitive, I suspect. :(

What is it, exactly, you are trying to accomplish - at a high level?  Don't focus on the instruction level, if you can avoid it.