Community
cancel
Showing results for 
Search instead for 
Did you mean: 
69 Views

coarse grain split of the types of cycles executed

Jump to solution
I am new to vtune, many of the performance counters are new to me and the manual 3B is huge :(

is there a way to gain a ratio split between types of cycles executed, just the ratio between int,mem and floating cycles executed per unit time (its fine that SSE* are bunched together in floats etc)? A rough split like for a process A running 100% on a core, 30% was for int, 30% for float and 40% was for memory.

This is on a i7, and i am currently using information via instruction_retired.any, memory_inst_retired.loads/stores and fp_comp_ops_exe.* . I am guessing that summing fp_comp_ops_exe should give me some kind of float metric and some metric for memory via memory_instr_retired.loads/stores. I am confused about what to normalize these metrics with and whether there is better (perf counters) way to get information.

The nearest thread i found mentioned using SIMD_INST_RETIRED but that seems to be absent on i7 (vtl doesn't do anything when this event is tried) http://software.intel.com/en-us/forums/showthread.php?t=65527


Thanks!
0 Kudos
1 Solution
Peter_W_Intel
Employee
69 Views
I am new to vtune, many of the performance counters are new to me and the manual 3B is huge :(

is there a way to gain a ratio split between types of cycles executed, just the ratio between int,mem and floating cycles executed per unit time (its fine that SSE* are bunched together in floats etc)? A rough split like for a process A running 100% on a core, 30% was for int, 30% for float and 40% was for memory.

This is on a i7, and i am currently using information via instruction_retired.any, memory_inst_retired.loads/stores and fp_comp_ops_exe.* . I am guessing that summing fp_comp_ops_exe should give me some kind of float metric and some metric for memory via memory_instr_retired.loads/stores. I am confused about what to normalize these metrics with and whether there is better (perf counters) way to get information.

The nearest thread i found mentioned using SIMD_INST_RETIRED but that seems to be absent on i7 (vtl doesn't do anything when this event is tried) http://software.intel.com/en-us/forums/showthread.php?t=65527


Thanks!

Hi,

Please refer to VTuneHelppmn.chm for Intel Core 2 i7 processors.

Use FP_ASSIST.ALL event to know all FP x87 instructions in your code.

Use FP_MMX_TRANS.ANY to know all FP instructions executed by SSE* in your code.

Use MEM_INST_RETIRED.LOADS& MEM_INST_RETIRED.STORESto know all memory access instructions in your code.

Rest of your code should be integer instructions, and no memory access.

Hope it helps.

Regards, Peter

View solution in original post

6 Replies
Peter_W_Intel
Employee
70 Views
I am new to vtune, many of the performance counters are new to me and the manual 3B is huge :(

is there a way to gain a ratio split between types of cycles executed, just the ratio between int,mem and floating cycles executed per unit time (its fine that SSE* are bunched together in floats etc)? A rough split like for a process A running 100% on a core, 30% was for int, 30% for float and 40% was for memory.

This is on a i7, and i am currently using information via instruction_retired.any, memory_inst_retired.loads/stores and fp_comp_ops_exe.* . I am guessing that summing fp_comp_ops_exe should give me some kind of float metric and some metric for memory via memory_instr_retired.loads/stores. I am confused about what to normalize these metrics with and whether there is better (perf counters) way to get information.

The nearest thread i found mentioned using SIMD_INST_RETIRED but that seems to be absent on i7 (vtl doesn't do anything when this event is tried) http://software.intel.com/en-us/forums/showthread.php?t=65527


Thanks!

Hi,

Please refer to VTuneHelppmn.chm for Intel Core 2 i7 processors.

Use FP_ASSIST.ALL event to know all FP x87 instructions in your code.

Use FP_MMX_TRANS.ANY to know all FP instructions executed by SSE* in your code.

Use MEM_INST_RETIRED.LOADS& MEM_INST_RETIRED.STORESto know all memory access instructions in your code.

Rest of your code should be integer instructions, and no memory access.

Hope it helps.

Regards, Peter

View solution in original post

69 Views

Hi,

Please refer to VTuneHelppmn.chm for Intel Core 2 i7 processors.

Use FP_ASSIST.ALL event to know all FP x87 instructions in your code.

Use FP_MMX_TRANS.ANY to know all FP instructions executed by SSE* in your code.

Use MEM_INST_RETIRED.LOADS & MEM_INST_RETIRED.STORES to know all memory access instructions in your code.

Rest of your code should be integer instructions, and no memory access.

Hope it helps.

Regards, Peter

Thank you so much, this informaiton is most useful!

BTW where can i find the chm documentation as i am using linux?
Vladimir_T_Intel
Moderator
69 Views

BTW where can i find the chm documentation as i am using linux?
Run the Eclipse framework and find the Online help in there. The help content regarding counters is pretty the same to Windows chm.
69 Views
Run the Eclipse framework and find the Online help in there. The help content regarding counters is pretty the same to Windows chm.
thanks! though its deep down in the eclipse help. It will be great to have a direct html indexed version of the help file :)
Vladimir_T_Intel
Moderator
69 Views
thanks! though its deep down in the eclipse help. It will be great to have a direct html indexed version of the help file :)

Eclipse uses the html indexed files. You can find them in the documentation directory.

For Linux having man pages is more "direct" way to obtain help. VTune provides man pagesas as well, although the information in the html sources is more complete.
69 Views
Thanks to all the help i got, i am able to get statistics for events about different things (e.g. for lapack via matlab)

Sample run:
MEM_INST_RETIRED.LOADS 300
MEM_INST_RETIRED.STORES 197
FP_MMX_TRANS.ANY 0
FP_ASSIST.ALL 0
INST_RETIRED.ANY 1622
FP_COMP_OPS_EXE.SSE_FP 753
FP_COMP_OPS_EXE.X87 4
FP_COMP_OPS_EXE.MMX 0

Memory cycles can be counted but i am stumped at how to find floating cycles. FP_ASSIST is usually 0 and only FP_COMPS_* is non-zero. So should i use FP_COMPS instead of FP_ASSIST? Also this post says that compilers use sse2 instead of x87 for i7, thereby FP_ASSIST might not be useful http://software.intel.com/en-us/forums/showthread.php?t=65548


Also if i understand somewhat, if FP_COMP_OPS_EXE* is with micro-ops, so how do i connect it with # of float instructions (i am guessing SSE*) executed?

Thanks for all the help.
Reply