- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am new to vtune, many of the performance counters are new to me and the manual 3B is huge :(
is there a way to gain a ratio split between types of cycles executed, just the ratio between int,mem and floating cycles executed per unit time (its fine that SSE* are bunched together in floats etc)? A rough split like for a process A running 100% on a core, 30% was for int, 30% for float and 40% was for memory.
This is on a i7, and i am currently using information via instruction_retired.any, memory_inst_retired.loads/stores and fp_comp_ops_exe.* . I am guessing that summing fp_comp_ops_exe should give me some kind of float metric and some metric for memory via memory_instr_retired.loads/stores. I am confused about what to normalize these metrics with and whether there is better (perf counters) way to get information.
The nearest thread i found mentioned using SIMD_INST_RETIRED but that seems to be absent on i7 (vtl doesn't do anything when this event is tried) http://software.intel.com/en-us/forums/showthread.php?t=65527
Thanks!
is there a way to gain a ratio split between types of cycles executed, just the ratio between int,mem and floating cycles executed per unit time (its fine that SSE* are bunched together in floats etc)? A rough split like for a process A running 100% on a core, 30% was for int, 30% for float and 40% was for memory.
This is on a i7, and i am currently using information via instruction_retired.any, memory_inst_retired.loads/stores and fp_comp_ops_exe.* . I am guessing that summing fp_comp_ops_exe should give me some kind of float metric and some metric for memory via memory_instr_retired.loads/stores. I am confused about what to normalize these metrics with and whether there is better (perf counters) way to get information.
The nearest thread i found mentioned using SIMD_INST_RETIRED but that seems to be absent on i7 (vtl doesn't do anything when this event is tried) http://software.intel.com/en-us/forums/showthread.php?t=65527
Thanks!
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - titanius.anglesmith
I am new to vtune, many of the performance counters are new to me and the manual 3B is huge :(
is there a way to gain a ratio split between types of cycles executed, just the ratio between int,mem and floating cycles executed per unit time (its fine that SSE* are bunched together in floats etc)? A rough split like for a process A running 100% on a core, 30% was for int, 30% for float and 40% was for memory.
This is on a i7, and i am currently using information via instruction_retired.any, memory_inst_retired.loads/stores and fp_comp_ops_exe.* . I am guessing that summing fp_comp_ops_exe should give me some kind of float metric and some metric for memory via memory_instr_retired.loads/stores. I am confused about what to normalize these metrics with and whether there is better (perf counters) way to get information.
The nearest thread i found mentioned using SIMD_INST_RETIRED but that seems to be absent on i7 (vtl doesn't do anything when this event is tried) http://software.intel.com/en-us/forums/showthread.php?t=65527
Thanks!
is there a way to gain a ratio split between types of cycles executed, just the ratio between int,mem and floating cycles executed per unit time (its fine that SSE* are bunched together in floats etc)? A rough split like for a process A running 100% on a core, 30% was for int, 30% for float and 40% was for memory.
This is on a i7, and i am currently using information via instruction_retired.any, memory_inst_retired.loads/stores and fp_comp_ops_exe.* . I am guessing that summing fp_comp_ops_exe should give me some kind of float metric and some metric for memory via memory_instr_retired.loads/stores. I am confused about what to normalize these metrics with and whether there is better (perf counters) way to get information.
The nearest thread i found mentioned using SIMD_INST_RETIRED but that seems to be absent on i7 (vtl doesn't do anything when this event is tried) http://software.intel.com/en-us/forums/showthread.php?t=65527
Thanks!
Hi,
Please refer to VTuneHelppmn.chm for Intel Core 2 i7 processors.
Use FP_ASSIST.ALL event to know all FP x87 instructions in your code.
Use FP_MMX_TRANS.ANY to know all FP instructions executed by SSE* in your code.
Use MEM_INST_RETIRED.LOADS& MEM_INST_RETIRED.STORESto know all memory access instructions in your code.
Rest of your code should be integer instructions, and no memory access.
Hope it helps.
Regards, Peter
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - titanius.anglesmith
I am new to vtune, many of the performance counters are new to me and the manual 3B is huge :(
is there a way to gain a ratio split between types of cycles executed, just the ratio between int,mem and floating cycles executed per unit time (its fine that SSE* are bunched together in floats etc)? A rough split like for a process A running 100% on a core, 30% was for int, 30% for float and 40% was for memory.
This is on a i7, and i am currently using information via instruction_retired.any, memory_inst_retired.loads/stores and fp_comp_ops_exe.* . I am guessing that summing fp_comp_ops_exe should give me some kind of float metric and some metric for memory via memory_instr_retired.loads/stores. I am confused about what to normalize these metrics with and whether there is better (perf counters) way to get information.
The nearest thread i found mentioned using SIMD_INST_RETIRED but that seems to be absent on i7 (vtl doesn't do anything when this event is tried) http://software.intel.com/en-us/forums/showthread.php?t=65527
Thanks!
is there a way to gain a ratio split between types of cycles executed, just the ratio between int,mem and floating cycles executed per unit time (its fine that SSE* are bunched together in floats etc)? A rough split like for a process A running 100% on a core, 30% was for int, 30% for float and 40% was for memory.
This is on a i7, and i am currently using information via instruction_retired.any, memory_inst_retired.loads/stores and fp_comp_ops_exe.* . I am guessing that summing fp_comp_ops_exe should give me some kind of float metric and some metric for memory via memory_instr_retired.loads/stores. I am confused about what to normalize these metrics with and whether there is better (perf counters) way to get information.
The nearest thread i found mentioned using SIMD_INST_RETIRED but that seems to be absent on i7 (vtl doesn't do anything when this event is tried) http://software.intel.com/en-us/forums/showthread.php?t=65527
Thanks!
Hi,
Please refer to VTuneHelppmn.chm for Intel Core 2 i7 processors.
Use FP_ASSIST.ALL event to know all FP x87 instructions in your code.
Use FP_MMX_TRANS.ANY to know all FP instructions executed by SSE* in your code.
Use MEM_INST_RETIRED.LOADS& MEM_INST_RETIRED.STORESto know all memory access instructions in your code.
Rest of your code should be integer instructions, and no memory access.
Hope it helps.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Peter Wang (Intel)
Hi,
Please refer to VTuneHelppmn.chm for Intel Core 2 i7 processors.
Use FP_ASSIST.ALL event to know all FP x87 instructions in your code.
Use FP_MMX_TRANS.ANY to know all FP instructions executed by SSE* in your code.
Use MEM_INST_RETIRED.LOADS & MEM_INST_RETIRED.STORES to know all memory access instructions in your code.
Rest of your code should be integer instructions, and no memory access.
Hope it helps.
Regards, Peter
Thank you so much, this informaiton is most useful!
BTW where can i find the chm documentation as i am using linux?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - titanius.anglesmith
BTW where can i find the chm documentation as i am using linux?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Vladimir Tsymbal (Intel)
Run the Eclipse framework and find the Online help in there. The help content regarding counters is pretty the same to Windows chm.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - titanius.anglesmith
thanks! though its deep down in the eclipse help. It will be great to have a direct html indexed version of the help file :)
Eclipse uses the html indexed files. You can find them in the documentation directory.
For Linux having man pages is more "direct" way to obtain help. VTune provides man pagesas as well, although the information in the html sources is more complete.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks to all the help i got, i am able to get statistics for events about different things (e.g. for lapack via matlab)
Sample run:
MEM_INST_RETIRED.LOADS 300
MEM_INST_RETIRED.STORES 197
FP_MMX_TRANS.ANY 0
FP_ASSIST.ALL 0
INST_RETIRED.ANY 1622
FP_COMP_OPS_EXE.SSE_FP 753
FP_COMP_OPS_EXE.X87 4
FP_COMP_OPS_EXE.MMX 0
Memory cycles can be counted but i am stumped at how to find floating cycles. FP_ASSIST is usually 0 and only FP_COMPS_* is non-zero. So should i use FP_COMPS instead of FP_ASSIST? Also this post says that compilers use sse2 instead of x87 for i7, thereby FP_ASSIST might not be useful http://software.intel.com/en-us/forums/showthread.php?t=65548
Also if i understand somewhat, if FP_COMP_OPS_EXE* is with micro-ops, so how do i connect it with # of float instructions (i am guessing SSE*) executed?
Thanks for all the help.
Sample run:
MEM_INST_RETIRED.LOADS 300
MEM_INST_RETIRED.STORES 197
FP_MMX_TRANS.ANY 0
FP_ASSIST.ALL 0
INST_RETIRED.ANY 1622
FP_COMP_OPS_EXE.SSE_FP 753
FP_COMP_OPS_EXE.X87 4
FP_COMP_OPS_EXE.MMX 0
Memory cycles can be counted but i am stumped at how to find floating cycles. FP_ASSIST is usually 0 and only FP_COMPS_* is non-zero. So should i use FP_COMPS instead of FP_ASSIST? Also this post says that compilers use sse2 instead of x87 for i7, thereby FP_ASSIST might not be useful http://software.intel.com/en-us/forums/showthread.php?t=65548
Also if i understand somewhat, if FP_COMP_OPS_EXE* is with micro-ops, so how do i connect it with # of float instructions (i am guessing SSE*) executed?
Thanks for all the help.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page