- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I am interested in using Vtune to profile a system. I have run a project and gathered the results. I am looking at the hardware event samples for a specific cpu. EG: All hardware events for CPU 0. The problem I am having is that I want to look at the results based on small time intervals. Basically I want to see the results for every 15ms.
The problem seems to be that the normal view takes too long. I have to setup the time range and filter the results for each time interval. Is there a way of looking at the results based on time? Maybe a way of setting a time range and seeing every 15ms time range from start to finish?
Maybe if Vtune cant do it, does anyone know if any of the other tools in this suite can?
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just show you a simple example about using advanced-hotspots analysis - my box is CPU 3.2 GHz frequency, setting 3,200,000 as default SAV value, it means 1000 samples per second (or 1 sample per ms), you may use 320,000 as SAV - it takes 10 sample per ms, there are 150 sample during 15ms. However, if you only want to get samples on CPU 0, total samples are 150/cores during 15ms. You may change SAV 320,000 to small one to capture more samples, note that 100,000 is minimum. Don't use SAV too small - it will cause unexpected result.
My example: (collect performance data, duration is 10s)
> amplxe-cl -collect-with runsa -cpu-mask 0 -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=100000,CPU_CLK_UNHALTED.REF:sa=100000,INST_RETIRED.ANY_P:sa=100000 -d 10
Summary
-------
Elapsed Time: 10.001
Event summary
-------------
Hardware Event Type Hardware Event Count:Self Hardware Event Sample Count:Self Events Per Sample
----------------------- ------------------------- -------------------------------- -----------------
CPU_CLK_UNHALTED.THREAD 14600000 146 100000
CPU_CLK_UNHALTED.REF 14100000 141 100000
INST_RETIRED.ANY_P 5000000 50 100000
You can review performance data from 5.1s to 5.3s, also can use to amplxe-gui to open results then filtering result.
> amplxe-cl -R hw-events -time-filter=5.1:5.3
amplxe: Using result path `/home/peter/problem_report/vtune_mic_offload/r014runsa'
amplxe: Executing actions 50 % Generating a report
Function Module Hardware Event Count:CPU_CLK_UNHALTED.THREAD (K) Hardware Event Count:CPU_CLK_UNHALTED.REF (K) Hardware Event Count:INST_RETIRED.ANY_P (K)
------------------ ------- ------------------------------------------------ --------------------------------------------- -------------------------------------------
finish_task_switch vmlinux 100 0 0
intel_idle vmlinux 0 100 0
trace_clock_local vmlinux 0 100 0
try_to_wake_up vmlinux 0 0 100
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Its a little difficult to understand what youre saying but let me just ask you, and you can tell me what it says. You might not be showing it for berevity or maybe because its not there. I am not sure so I want to clarify.
So using system profiling in Vtune, I can see like 63 different hardware event counters. You can select a time range and filter in by that time range. What I am looking for is instead of the GUI showing a table where the Row is cpu_0 and the columns are the events. So it looks like this:
Event1 Event2 Event3 .....etc
cpu_0 0 0 0
As an example, those are the results from the selected time frame. So for each time you want to look at, you have to hand select the time range and filter in by selection. This would need to be done for thousands of time intervals for 1 sample set.I want something like this:
Event1 Event2 Event3 ....etc
Time1 0 0 0
Time2 0 0 0
Time3 0 0 0
etc
Where I specify the time frame I want to see. Are you saying that doing it the way you mentioned will output a result like this?
Thanks,
Justin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. What event I used is CPU clock - which always happened in 15ms interval. If you use other events, you need to ensure that event will occur (for example - cache miss in your code) interval 15ms? - usually you need to increase workload for these events or reduce SAV value (I showed you above).
2. Option "-time-filter" used in report is to show interest of data in time range, for example - [5.1s to 5.2], [5.2 to 5.3], etc...there is no ONE command line to display data for several time ranges. So you may run "amplxe-cl -R" several times. THEN write these data (time1, time2, time3) into your report.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This kind of puts me back to square one. Having to regenerate the data, by hand, for each time frame, would take years. I need 15ms samples for several minutes. Is there maybe a non-gui version that could just output that information to a command window? Maybe I could figure out a way of generating the data this way if I could write a script to generate the data in terminal. Is this something the GUI doesnt allow or does it not allow it because of the table structure? I see the database that Vtune creates for the file but I am not an SQL expert and dont know how to extract data from it. Maybe if you could tell me SQL query to get:
Event1 Event2 Event3 .....etc
cpu_0 0 0 0
between Time A and time B, I can write a python script and just automate the query and get the thousands of results I need.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SQL data is not supported for the user to access directly, they are VTune internal data.
You may use VTune report several times to save outputs, then analyze them, finally generate your expected (format) report - in your python script.
# amplxe-cl -collect-with runsa -cpu-mask 0 -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=100000,CPU_CLK_UNHALTED.REF:sa=100000,INST_RETIRED.ANY_P:sa=100000 -d 10
# amplxe-cl -R hw-events -time-filter=5.000:5.015
amplxe: Using result path `/home/peter/problem_report/vtune_mic_offload/r020runsa'
amplxe: Executing actions 50 % Generating a report
Function Module Hardware Event Count:CPU_CLK_UNHALTED.THREAD (K) Hardware Event Count:CPU_CLK_UNHALTED.REF (K) Hardware Event Count:INST_RETIRED.ANY_P (K)
------------------ ------- ------------------------------------------------ --------------------------------------------- -------------------------------------------
intel_idle vmlinux 100 300 0
# amplxe-cl -R hw-events -time-filter=5.015:5.030
amplxe: Using result path `/home/peter/problem_report/vtune_mic_offload/r020runsa'
amplxe: Executing actions 50 % Generating a report
Function Module Hardware Event Count:CPU_CLK_UNHALTED.THREAD (K) Hardware Event Count:CPU_CLK_UNHALTED.REF (K) Hardware Event Count:INST_RETIRED.ANY_P (K)
--------------------- ------- ------------------------------------------------ --------------------------------------------- -------------------------------------------
__do_softirq vmlinux 200 100 0
intel_idle vmlinux 0 100 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So it appears this is my command:
"C:\Program Files (x86)\Intel\VTune Amplifier XE 2015\bin32\amplxe-cl" -collect-with runsa -cpu-mask 0 -knob event-config=BACLEARS.ANY:sa=100003,BR_MISP_RETIRED.ALL_BRANCHES_PS:sa=400009,CPU_CLK_UNHALTED.REF_TSC:sa=2000003,CPU_CLK_UNHALTED.THREAD:sa=2000003,CPU_CLK_UNHALTED.THREAD_P:sa=2000003,CYCLE_ACTIVITY.CYCLES_NO_EXECUTE:sa=2000003,CYCLE_ACTIVITY.STALLS_L1D_PENDING:sa=2000003,CYCLE_ACTIVITY.STALLS_L2_PENDING:sa=2000003,CYCLE_ACTIVITY.STALLS_LDM_PENDING:sa=2000003,DSB2MITE_SWITCHES.PENALTY_CYCLES:sa=2000003,DTLB_LOAD_MISSES.STLB_HIT_2M:sa=2000003,DTLB_LOAD_MISSES.STLB_HIT_4K:sa=2000003,DTLB_LOAD_MISSES.WALK_DURATION:sa=2000003,DTLB_STORE_MISSES.STLB_HIT_2M:sa=100003,DTLB_STORE_MISSES.STLB_HIT_4K:sa=100003,DTLB_STORE_MISSES.WALK_DURATION:sa=100003,ICACHE.IFETCH_STALL:sa=2000003,ICACHE.MISSES:sa=200003,IDQ.ALL_DSB_CYCLES_4_UOPS:sa=2000003,IDQ.ALL_DSB_CYCLES_ANY_UOPS:sa=2000003,IDQ.ALL_MITE_CYCLES_4_UOPS:sa=2000003,IDQ.ALL_MITE_CYCLES_ANY_UOPS:sa=2000003,IDQ.MS_SWITCHES:sa=2000003,IDQ.MS_UOPS:sa=2000003,IDQ_UOPS_NOT_DELIVERED.CORE:sa=2000003,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE:sa=2000003,ILD_STALL.LCP:sa=2000003,INST_RETIRED.ANY:sa=2000003,INT_MISC.RECOVERY_CYCLES:sa=2000003,ITLB_MISSES.STLB_HIT:sa=100003,ITLB_MISSES.WALK_DURATION:sa=100003,L1D.REPLACEMENT:sa=2000003,L1D_PEND_MISS.PENDING:sa=2000003,L2_LINES_IN.ALL:sa=100003,LD_BLOCKS.NO_SR:sa=100003,LD_BLOCKS.STORE_FORWARD:sa=100003,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS:sa=100003,MACHINE_CLEARS.COUNT:sa=100003,MACHINE_CLEARS.MASKMOV:sa=100003,MACHINE_CLEARS.MEMORY_ORDERING:sa=100003,MACHINE_CLEARS.SMC:sa=100003,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS:sa=20011,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS:sa=20011,MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM:sa=100007,MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM_PS:sa=100007,MEM_LOAD_UOPS_RETIRED.HIT_LFB:sa=100003,MEM_LOAD_UOPS_RETIRED.L1_MISS:sa=100003,MEM_LOAD_UOPS_RETIRED.L3_HIT_PS:sa=50021,MEM_LOAD_UOPS_RETIRED.L3_MISS_PS:sa=100007,MEM_UOPS_RETIRED.ALL_STORES_PS:sa=2000003,MEM_UOPS_RETIRED.SPLIT_LOADS_PS:sa=100003,MEM_UOPS_RETIRED.SPLIT_STORES_PS:sa=100003,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA_RD:sa=2000003,OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD:sa=2000003,OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_MISS.LOCAL_DRAM:sa=100003,RESOURCE_STALLS.SB:sa=2000003,RS_EVENTS.EMPTY_CYCLES:sa=2000003,RS_EVENTS.EMPTY_END:sa=200003,UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC:sa=2000003,UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC:sa=2000003,UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC:sa=2000003,UOPS_ISSUED.ANY:sa=2000003,UOPS_RETIRED.RETIRE_SLOTS:sa=2000003 -d 10
How do I read with all those events for a single cpu? It looks like the command you give returns functions for that time frame. What If I just want to see cpu_0? Also, it looks like its only reading for a set amount of time. What is the switch making it do this? I need it to run for MUCH longer. If I use:
"C:\Program Files (x86)\Intel\VTune Amplifier XE 2015\bin32\amplxe-cl" -R hw-events -time-filter=5.1:5.3
It gives all the hardware counters for that period of time, which is awesome but it does it by function. Also, If I want to run in GUI then read with command window, can I do that? Or do I have to run with command window if I want to read from command window. If I can, how do I read from already generated collections?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seemed that you used many events, so please add option "-allow-multiple-runs" for running as multiple sessions.
>How do I read with all those events for a single cpu? It looks like the command you give returns functions for that time frame. What If I just want to see cpu_0?
All data are for cpu 0, since you used "-cpu-mask 0 ". Your script should be capable to summarize(analyze) data for all functions, from VTune's report.
>If I want to run in GUI then read with command window, can I do that?
That is, use amplxe-gui to open result, in bottom-up report, zoom-in then filtering on selected range, total events will be updated in hot function windows -> you select multiple lines (all) to know total data. (not use or read from command window)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am commenting for replys
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"C:\Program Files (x86)\Intel\VTune Amplifier XE 2015\bin32\amplxe-cl" -collect-with runsa -cpu-mask 0 -knob event-config=BACLEARS.ANY:sa=100003,BR_MISP_RETIRED.ALL_BRANCHES_PS:sa=400009,CPU_CLK_UNHALTED.REF_TSC:sa=2000003,CPU_CLK_UNHALTED.THREAD:sa=2000003,CPU_CLK_UNHALTED.THREAD_P:sa=2000003,CYCLE_ACTIVITY.CYCLES_NO_EXECUTE:sa=2000003,CYCLE_ACTIVITY.STALLS_L1D_PENDING:sa=2000003,CYCLE_ACTIVITY.STALLS_L2_PENDING:sa=2000003,CYCLE_ACTIVITY.STALLS_LDM_PENDING:sa=2000003,DSB2MITE_SWITCHES.PENALTY_CYCLES:sa=2000003,DTLB_LOAD_MISSES.STLB_HIT_2M:sa=2000003,DTLB_LOAD_MISSES.STLB_HIT_4K:sa=2000003,DTLB_LOAD_MISSES.WALK_DURATION:sa=2000003,DTLB_STORE_MISSES.STLB_HIT_2M:sa=100003,DTLB_STORE_MISSES.STLB_HIT_4K:sa=100003,DTLB_STORE_MISSES.WALK_DURATION:sa=100003,ICACHE.IFETCH_STALL:sa=2000003,ICACHE.MISSES:sa=200003,IDQ.ALL_DSB_CYCLES_4_UOPS:sa=2000003,IDQ.ALL_DSB_CYCLES_ANY_UOPS:sa=2000003,IDQ.ALL_MITE_CYCLES_4_UOPS:sa=2000003,IDQ.ALL_MITE_CYCLES_ANY_UOPS:sa=2000003,IDQ.MS_SWITCHES:sa=2000003,IDQ.MS_UOPS:sa=2000003,IDQ_UOPS_NOT_DELIVERED.CORE:sa=2000003,IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE:sa=2000003,ILD_STALL.LCP:sa=2000003,INST_RETIRED.ANY:sa=2000003,INT_MISC.RECOVERY_CYCLES:sa=2000003,ITLB_MISSES.STLB_HIT:sa=100003,ITLB_MISSES.WALK_DURATION:sa=100003,L1D.REPLACEMENT:sa=2000003,L1D_PEND_MISS.PENDING:sa=2000003,L2_LINES_IN.ALL:sa=100003,LD_BLOCKS.NO_SR:sa=100003,LD_BLOCKS.STORE_FORWARD:sa=100003,LD_BLOCKS_PARTIAL.ADDRESS_ALIAS:sa=100003,MACHINE_CLEARS.COUNT:sa=100003,MACHINE_CLEARS.MASKMOV:sa=100003,MACHINE_CLEARS.MEMORY_ORDERING:sa=100003,MACHINE_CLEARS.SMC:sa=100003,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS:sa=20011,MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS:sa=20011,MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM:sa=100007,MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM_PS:sa=100007,MEM_LOAD_UOPS_RETIRED.HIT_LFB:sa=100003,MEM_LOAD_UOPS_RETIRED.L1_MISS:sa=100003,MEM_LOAD_UOPS_RETIRED.L3_HIT_PS:sa=50021,MEM_LOAD_UOPS_RETIRED.L3_MISS_PS:sa=100007,MEM_UOPS_RETIRED.ALL_STORES_PS:sa=2000003,MEM_UOPS_RETIRED.SPLIT_LOADS_PS:sa=100003,MEM_UOPS_RETIRED.SPLIT_STORES_PS:sa=100003,OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA_RD:sa=2000003,OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD:sa=2000003,OFFCORE_RESPONSE:request=ALL_DATA_RD:response=L3_MISS.LOCAL_DRAM:sa=100003,RESOURCE_STALLS.SB:sa=2000003,RS_EVENTS.EMPTY_CYCLES:sa=2000003,RS_EVENTS.EMPTY_END:sa=200003,UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC:sa=2000003,UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC:sa=2000003,UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC:sa=2000003,UOPS_ISSUED.ANY:sa=2000003,UOPS_RETIRED.RETIRE_SLOTS:sa=2000003 -allow-multiple-runs -d 10
Is there a page for the command definitions? I need to be able to run the command for longer but it seems like its using a set time. Is that because of -d 10? Alright, Ill have to just collect the data from all the functions myself I guess.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Option "-allow-multiple-runs" is ONLY for launching application mode, not for system wide profiling - I wrote your command in a batch file. Sorry.
It does't make sense that you use metrics (events) more than 8 at one session, you may put them in different groups for different run.
You can change "-d 10" to "-d 600" (10 minutes) if you like.long run.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why can I not run more than 8 at a time? When you run in the GUI you get all of those events in a single run. Thanks, I will remove the -allow-multiple-runs. So -d is the run time in seconds for the program.
Is there a document that explains all the switches that can be used here?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it's called the product help. ;) See "Command Line Reference" topic in the help files (press F1 in the GUI to open the help - or Help -> VTune Amplifier XE 2015 Help menu item). Here is an online version.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I wanted to get CPU utilization percentages. What would I need to add to me command?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Justin H. wrote:
If I wanted to get CPU utilization percentages. What would I need to add to me command?
Thanks!
That is another topic : 1. not use many PMU events, only use clocks and instruction retired (adavaned-hotspots analysis) to know CPU time - active, 2. usually VTune's bottom-up report shows CPU utilization data on timeline panel of bottom-up report.3. it's meaningless to know CPU utilization on a specific function, if you really wants to know how functions are busy or not, use utilization for function = 100% -( inactive time + wait time) / (CPU time + inactive time+ wait time)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I Was looking to be able to sample and see the Cpu_0 utilization for the time sampled as a 0-100% number. Yes, in the gui you see utilization as the graph on the bottom but can you see utilization as a percent from command line?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Simple answer is No. command amplxe-cl does't support of reporting CPU utilization (percent) data from timeline panel (GUI), directly.
As a workaround, analyze your top hot functions' "busy" or "not busy" (see my previous post) then reflect (estimate) CPU utilization results (overall data, not for time stamp).
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page