Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Extracting results based on time ranges

Justin_H_
Beginner
1,374 Views

Hello, I am interested in using Vtune to profile a system. I have run a project and gathered the results. I am looking at the hardware event samples for a specific cpu. EG: All hardware events for CPU 0. The problem I am having is that I want to look at the results based on small time intervals. Basically I want to see the results for every 15ms. 

The problem seems to be that the normal view takes too long. I have to setup the time range and filter the results for each time interval. Is there a way of looking at the results based on time? Maybe a way of setting a time range and seeing every 15ms time range from start to finish?

Maybe if Vtune cant do it, does anyone know if any of the other tools in this suite can?

 

Thanks!

0 Kudos
16 Replies
Peter_W_Intel
Employee
1,374 Views

I just show you a simple example about using advanced-hotspots analysis - my box is CPU 3.2 GHz frequency, setting 3,200,000 as default SAV value, it means 1000 samples per second (or 1 sample per ms), you may use 320,000 as SAV - it takes 10 sample per ms, there are 150 sample during 15ms. However, if you only want to get samples on CPU 0, total samples are 150/cores during 15ms. You may change SAV 320,000 to small one to capture more samples, note that 100,000 is minimum. Don't use SAV too small - it will cause unexpected result.

My example: (collect performance data, duration is 10s)

> amplxe-cl -collect-with runsa -cpu-mask 0 -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=100000,CPU_CLK_UNHALTED.REF:sa=100000,INST_RETIRED.ANY_P:sa=100000 -d 10

 

Summary
-------
Elapsed Time:  10.001

Event summary
-------------
Hardware Event Type      Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample
-----------------------  -------------------------  --------------------------------  -----------------
CPU_CLK_UNHALTED.THREAD                   14600000                               146  100000           
CPU_CLK_UNHALTED.REF                      14100000                               141  100000           
INST_RETIRED.ANY_P                         5000000                                50  100000           

You can review performance data from 5.1s to 5.3s, also can use to amplxe-gui to open results then filtering result.

> amplxe-cl -R hw-events -time-filter=5.1:5.3
amplxe: Using result path `/home/peter/problem_report/vtune_mic_offload/r014runsa'
amplxe: Executing actions 50 % Generating a report                             
Function            Module   Hardware Event Count:CPU_CLK_UNHALTED.THREAD (K)  Hardware Event Count:CPU_CLK_UNHALTED.REF (K)  Hardware Event Count:INST_RETIRED.ANY_P (K)
------------------  -------  ------------------------------------------------  ---------------------------------------------  -------------------------------------------
finish_task_switch  vmlinux                                               100                                              0                                            0
intel_idle          vmlinux                                                 0                                            100                                            0
trace_clock_local   vmlinux                                                 0                                            100                                            0
try_to_wake_up      vmlinux                                                 0                                              0                                          100

 

0 Kudos
Justin_H_
Beginner
1,374 Views

Its a little difficult to understand what youre saying but let me just ask you, and you can tell me what it says. You might not be showing it for berevity or maybe because its not there. I am not sure so I want to clarify.

So using system profiling in Vtune, I can see like 63 different hardware event counters. You can select a time range and filter in by that time range. What I am looking for is instead of the GUI showing a table where the Row is cpu_0 and the columns are the events. So it looks like this:

            Event1      Event2      Event3 .....etc

cpu_0       0             0                0

As an example, those are the results from the selected time frame. So for each time you want to look at, you have to hand select the time range and filter in by selection. This would need to be done for thousands of time intervals for 1 sample set.I want something like this:

              Event1        Event2     Event3 ....etc

Time1         0                  0              0

Time2         0                 0              0

Time3         0               0                0

etc

 

Where I specify the time frame I want to see. Are you saying that doing it the way you mentioned will output a result like this?

 

Thanks,

Justin

0 Kudos
Peter_W_Intel
Employee
1,374 Views

1. What event I used is CPU clock - which always happened in 15ms interval. If you use other events, you need to ensure that event will occur (for example - cache miss in your code) interval 15ms? - usually you need to increase workload for these events or reduce SAV value (I showed you above).

2. Option "-time-filter" used in report is to show interest of data in time range, for example - [5.1s to 5.2], [5.2 to 5.3], etc...there is no ONE command line to display data for several time ranges. So you may run "amplxe-cl -R" several times. THEN write these data (time1, time2, time3) into your report.

0 Kudos
Justin_H_
Beginner
1,374 Views

This kind of puts me back to square one. Having to regenerate the data, by hand, for each time frame, would take years. I need 15ms samples for several minutes. Is there maybe a non-gui version that could just output that information to a command window? Maybe I could figure out a way of generating the data this way if I could write a script to generate the data in terminal.  Is this something the GUI doesnt allow or does it not allow it because of the table structure? I see the database that Vtune creates for the file but I am not an SQL expert and dont know how to extract data from it. Maybe if you could tell me SQL query to get:

            Event1      Event2      Event3 .....etc

cpu_0       0             0                0

between Time A and time B, I can write a python script and just automate the query and get the thousands of results I need.

Thanks!

0 Kudos
Peter_W_Intel
Employee
1,374 Views

SQL data is not supported for the user to access directly, they are VTune internal data.

You may use VTune report several times to save outputs, then analyze them, finally generate your expected (format) report - in your python script.

# amplxe-cl -collect-with runsa -cpu-mask 0 -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=100000,CPU_CLK_UNHALTED.REF:sa=100000,INST_RETIRED.ANY_P:sa=100000 -d 10

 

# amplxe-cl -R hw-events -time-filter=5.000:5.015
amplxe: Using result path `/home/peter/problem_report/vtune_mic_offload/r020runsa'
amplxe: Executing actions 50 % Generating a report                             
Function            Module   Hardware Event Count:CPU_CLK_UNHALTED.THREAD (K)  Hardware Event Count:CPU_CLK_UNHALTED.REF (K)  Hardware Event Count:INST_RETIRED.ANY_P (K)
------------------  -------  ------------------------------------------------  ---------------------------------------------  -------------------------------------------
intel_idle          vmlinux                                               100                                            300                                            0

# amplxe-cl -R hw-events -time-filter=5.015:5.030
amplxe: Using result path `/home/peter/problem_report/vtune_mic_offload/r020runsa'
amplxe: Executing actions 50 % Generating a report                             
Function               Module   Hardware Event Count:CPU_CLK_UNHALTED.THREAD (K)  Hardware Event Count:CPU_CLK_UNHALTED.REF (K)  Hardware Event Count:INST_RETIRED.ANY_P (K)
---------------------  -------  ------------------------------------------------  ---------------------------------------------  -------------------------------------------
__do_softirq           vmlinux                                               200                                            100                                            0
intel_idle             vmlinux                                                 0                                            100                                            0

                 

 

0 Kudos
Justin_H_
Beginner
1,374 Views

So it appears this is my command:

 "C:\Program Files (x86)\Intel\VTune Amplifier XE 2015\bin32\amplxe-cl" -collect-with runsa -cpu-mask 0 -knob event-config=BACLEARS.ANY:sa=100003,BR_MISP_RETIRED.ALL_BRANCHES_PS:sa=400009,CPU_CLK_UNHALTED.REF_TSC:sa=2000003,CPU_CLK_UNHALTED.THREAD:sa=2000003,CPU_CLK_UNHALTED.THREAD_P:sa=2000003,CYCLE_ACTIVITY.CYCLES_NO_EXECUTE:sa=2000003,CYCLE_ACTIVITY.STALLS_L1D_PENDING:sa=2000003,CYCLE_ACTIVITY.STALLS_L2_PENDING:sa=2000003,CYCLE_ACTIVITY.STALLS_LDM_PENDING:sa=2000003,DSB2MITE_SWITCHES.PENALTY_CYCLES:sa=2000003,DTLB_LOAD_MISSES.STLB_HIT_2M:sa=2000003,DTLB_LOAD_MISSES.STLB_HIT_4K:sa=2000003,DTLB_LOAD_MISSES.WALK_DURATION:sa=2000003,DTLB_STORE_MISSES.STLB_HIT_2M:sa=100003,DTLB_STORE_MISSES.STLB_HIT_4K:sa=100003,DTLB_STORE_MISSES.WALK_DURATION:sa=100003,ICACHE.IFETCH_STALL:sa=2000003,ICACHE.MISSES:sa=200003,IDQ.ALL_DSB_CYCLES_4_UOPS:sa=2000003,IDQ.ALL_DSB_CYCLES_ANY_UOPS:sa=2000003,IDQ.ALL_MITE_CYCLES_4_UOPS:sa=2000003,IDQ.ALL_MITE_CYCLES_ANY_UOPS:sa=2000003,IDQ.MS_SWITCHES:sa=2000003,IDQ.MS_UOPS:sa=2000003,IDQ_UOPS_NOT_DELIVERED.CORE:sa=2000003,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE:sa=2000003,ILD_STALL.LCP:sa=2000003,INST_RETIRED.ANY:sa=2000003,INT_MISC.RECOVERY_CYCLES:sa=2000003,ITLB_MISSES.STLB_HIT:sa=100003,ITLB_MISSES.WALK_DURATION:sa=100003,L1D.REPLACEMENT:sa=2000003,L1D_PEND_MISS.PENDING:sa=2000003,L2_LINES_IN.ALL:sa=100003,LD_BLOCKS.NO_SR:sa=100003,LD_BLOCKS.STORE_FORWARD:sa=100003,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS:sa=100003,MACHINE_CLEARS.COUNT:sa=100003,MACHINE_CLEARS.MASKMOV:sa=100003,MACHINE_CLEARS.MEMORY_ORDERING:sa=100003,MACHINE_CLEARS.SMC:sa=100003,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS:sa=20011,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS:sa=20011,MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM:sa=100007,MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM_PS:sa=100007,MEM_LOAD_UOPS_RETIRED.HIT_LFB:sa=100003,MEM_LOAD_UOPS_RETIRED.L1_MISS:sa=100003,MEM_LOAD_UOPS_RETIRED.L3_HIT_PS:sa=50021,MEM_LOAD_UOPS_RETIRED.L3_MISS_PS:sa=100007,MEM_UOPS_RETIRED.ALL_STORES_PS:sa=2000003,MEM_UOPS_RETIRED.SPLIT_LOADS_PS:sa=100003,MEM_UOPS_RETIRED.SPLIT_STORES_PS:sa=100003,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA_RD:sa=2000003,OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD:sa=2000003,OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_MISS.LOCAL_DRAM:sa=100003,RESOURCE_STALLS.SB:sa=2000003,RS_EVENTS.EMPTY_CYCLES:sa=2000003,RS_EVENTS.EMPTY_END:sa=200003,UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC:sa=2000003,UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC:sa=2000003,UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC:sa=2000003,UOPS_ISSUED.ANY:sa=2000003,UOPS_RETIRED.RETIRE_SLOTS:sa=2000003 -d 10

 

 

How do I read with all those events for a single cpu? It looks like the command you give returns functions for that time frame. What If I just want to see cpu_0? Also, it looks like its only reading for a set amount of time. What is the switch making it do this? I need it to run for MUCH longer. If I use:

"C:\Program Files (x86)\Intel\VTune Amplifier XE 2015\bin32\amplxe-cl" -R hw-events -time-filter=5.1:5.3

It gives all the hardware counters for that period of time, which is awesome but it does it by function. Also, If I want to run in GUI then read with command window, can I do that? Or do I have to run with command window if I want to read from command window. If I can, how do I read from already generated collections?

0 Kudos
Peter_W_Intel
Employee
1,374 Views

It seemed that you used many events, so please add option "-allow-multiple-runs" for running as multiple sessions.

>How do I read with all those events for a single cpu? It looks like the command you give returns functions for that time frame. What If I just want to see cpu_0? 

All data are for cpu 0, since you used "-cpu-mask 0 ". Your script should be capable to summarize(analyze)  data for all functions, from VTune's report.

>If I want to run in GUI then read with command window, can I do that?

That is, use amplxe-gui to open result, in bottom-up report, zoom-in then filtering on selected range, total events will be updated in hot function windows -> you select multiple lines (all) to know total data. (not use or read from command window) 

 

 

 

0 Kudos
Carlos_P_1
Beginner
1,374 Views

I  am commenting for replys

0 Kudos
Justin_H_
Beginner
1,374 Views

"C:\Program Files (x86)\Intel\VTune Amplifier XE 2015\bin32\amplxe-cl" -collect-with runsa -cpu-mask 0 -knob event-config=BACLEARS.ANY:sa=100003,BR_MISP_RETIRED.ALL_BRANCHES_PS:sa=400009,CPU_CLK_UNHALTED.REF_TSC:sa=2000003,CPU_CLK_UNHALTED.THREAD:sa=2000003,CPU_CLK_UNHALTED.THREAD_P:sa=2000003,CYCLE_ACTIVITY.CYCLES_NO_EXECUTE:sa=2000003,CYCLE_ACTIVITY.STALLS_L1D_PENDING:sa=2000003,CYCLE_ACTIVITY.STALLS_L2_PENDING:sa=2000003,CYCLE_ACTIVITY.STALLS_LDM_PENDING:sa=2000003,DSB2MITE_SWITCHES.PENALTY_CYCLES:sa=2000003,DTLB_LOAD_MISSES.STLB_HIT_2M:sa=2000003,DTLB_LOAD_MISSES.STLB_HIT_4K:sa=2000003,DTLB_LOAD_MISSES.WALK_DURATION:sa=2000003,DTLB_STORE_MISSES.STLB_HIT_2M:sa=100003,DTLB_STORE_MISSES.STLB_HIT_4K:sa=100003,DTLB_STORE_MISSES.WALK_DURATION:sa=100003,ICACHE.IFETCH_STALL:sa=2000003,ICACHE.MISSES:sa=200003,IDQ.ALL_DSB_CYCLES_4_UOPS:sa=2000003,IDQ.ALL_DSB_CYCLES_ANY_UOPS:sa=2000003,IDQ.ALL_MITE_CYCLES_4_UOPS:sa=2000003,IDQ.ALL_MITE_CYCLES_ANY_UOPS:sa=2000003,IDQ.MS_SWITCHES:sa=2000003,IDQ.MS_UOPS:sa=2000003,IDQ_UOPS_NOT_DELIVERED.CORE:sa=2000003,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE:sa=2000003,ILD_STALL.LCP:sa=2000003,INST_RETIRED.ANY:sa=2000003,INT_MISC.RECOVERY_CYCLES:sa=2000003,ITLB_MISSES.STLB_HIT:sa=100003,ITLB_MISSES.WALK_DURATION:sa=100003,L1D.REPLACEMENT:sa=2000003,L1D_PEND_MISS.PENDING:sa=2000003,L2_LINES_IN.ALL:sa=100003,LD_BLOCKS.NO_SR:sa=100003,LD_BLOCKS.STORE_FORWARD:sa=100003,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS:sa=100003,MACHINE_CLEARS.COUNT:sa=100003,MACHINE_CLEARS.MASKMOV:sa=100003,MACHINE_CLEARS.MEMORY_ORDERING:sa=100003,MACHINE_CLEARS.SMC:sa=100003,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS:sa=20011,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS:sa=20011,MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM:sa=100007,MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM_PS:sa=100007,MEM_LOAD_UOPS_RETIRED.HIT_LFB:sa=100003,MEM_LOAD_UOPS_RETIRED.L1_MISS:sa=100003,MEM_LOAD_UOPS_RETIRED.L3_HIT_PS:sa=50021,MEM_LOAD_UOPS_RETIRED.L3_MISS_PS:sa=100007,MEM_UOPS_RETIRED.ALL_STORES_PS:sa=2000003,MEM_UOPS_RETIRED.SPLIT_LOADS_PS:sa=100003,MEM_UOPS_RETIRED.SPLIT_STORES_PS:sa=100003,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA_RD:sa=2000003,OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD:sa=2000003,OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_MISS.LOCAL_DRAM:sa=100003,RESOURCE_STALLS.SB:sa=2000003,RS_EVENTS.EMPTY_CYCLES:sa=2000003,RS_EVENTS.EMPTY_END:sa=200003,UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC:sa=2000003,UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC:sa=2000003,UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC:sa=2000003,UOPS_ISSUED.ANY:sa=2000003,UOPS_RETIRED.RETIRE_SLOTS:sa=2000003 -allow-multiple-runs -d 10

 

Is there a page for the command definitions? I need to be able to run the command for longer but it seems like its using a set time. Is that because of -d 10? Alright, Ill have to just collect the data from all the functions myself I guess.

0 Kudos
Peter_W_Intel
Employee
1,374 Views

Option "-allow-multiple-runs" is ONLY for launching application mode, not for system wide profiling - I wrote your command in a batch file. Sorry.

It does't make sense that you use metrics (events) more than 8 at one session, you may put them in different groups for different run.

You can change "-d 10" to "-d 600" (10 minutes) if you like.long run.

0 Kudos
Justin_H_
Beginner
1,374 Views

Why can I not run more than 8 at a time? When you run in the GUI you get all of those events in a single run. Thanks, I will remove the -allow-multiple-runs. So -d is the run time in seconds for the program.

Is there a document that explains all the switches that can be used here?

Thanks!

0 Kudos
David_A_Intel1
Employee
1,374 Views

Yes, it's called the product help. ;)  See "Command Line Reference" topic in the help files (press F1 in the GUI to open the help - or Help -> VTune Amplifier XE 2015 Help menu item).  Here is an online version.

0 Kudos
Justin_H_
Beginner
1,374 Views

If I wanted to get CPU utilization percentages. What would I need to add to me command?

 

Thanks!

0 Kudos
Peter_W_Intel
Employee
1,374 Views

Justin H. wrote:

If I wanted to get CPU utilization percentages. What would I need to add to me command?

 

Thanks!

 

That is another topic : 1. not use many PMU events, only use clocks and instruction retired (adavaned-hotspots analysis) to know CPU time - active, 2. usually VTune's bottom-up report shows CPU utilization data on timeline panel of bottom-up report.3. it's meaningless to know CPU utilization on a specific function, if you really wants to know how functions are busy or not, use utilization for function = 100% -( inactive time + wait time) / (CPU time + inactive time+ wait time) 

0 Kudos
Justin_H_
Beginner
1,374 Views

I Was looking to be able to sample and see the Cpu_0 utilization for the time sampled as a 0-100% number. Yes, in the gui you see utilization as the graph on the bottom but can you see utilization as a percent from command line?

0 Kudos
Peter_W_Intel
Employee
1,374 Views

Simple answer is No. command amplxe-cl does't support of reporting  CPU utilization (percent) data from timeline panel (GUI), directly.

As a workaround, analyze your top hot functions' "busy" or "not busy" (see my previous post) then reflect (estimate) CPU utilization results (overall data, not for time stamp).

0 Kudos
Reply