Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Dny
Beginner
78 Views

Incl Instruction, L1 cache misses

Hello ,

During analysis of one of my test code using VTune.
I'm using icc 11.0 without with disabling optimizatins using O0 flag.

I found that VTune is showing me 45% cache miss are caused by incl instruction.

I'm not able to find incl instruction in Intel instruction manual.
Does anybody know what does incl instruction does , and why VTune is showing so much of cache misses for tihs instruction?

Thanking you,

Regards,
Dny
0 Kudos
2 Replies
TimP
Black Belt
78 Views

Except for precise events (more of those on Nehalem family CPUs), VTune usually shows event accounting on a line with an instruction which is executed later. Cache misses would be "caused" by memory access. VTune is excellent for showing which loop produces the events, but not at showing which instruction.
I'm having difficulty reading your .png; it seems that your incl instruction (which simply increments a counter) is a branch target, so the events reported there would have been initiated prior to branching to that instruction.
srimks
New Contributor II
78 Views

Quoting - Dny
Hello ,

During analysis of one of my test code using VTune.
I'm using icc 11.0 without with disabling optimizatins using O0 flag.

I found that VTune is showing me 45% cache miss are caused by incl instruction.

I'm not able to find incl instruction in Intel instruction manual.
Does anybody know what does incl instruction does , and why VTune is showing so much of cache misses for tihs instruction?

Thanking you,

Regards,
Dny

Actually, it happens with VTune that takes "MEM_LOAD_RETIRED.L1D_LINE_MISS.events" samples at the instruction _next_ to the one actually taking longer to execute, it is the way sampling works - it captures CS:EIP from the interrupt stack at the service routine and captured instruction pointer (EIP)points at that time to the next instruction. So it is not increment (incl) but indirect addressing reference of mov [movl -20(%rbp), %eax] is theissue.

This "incl", here l signifies of long type, as the basic instruction is "inc" suffixed by data type either "long (l), word(w), quad(q), etc." or simply "inc" which means single byte. The primary use of this "inc" is to implement the counter (s), by adding 1 to the destination operand (here its base pointer %rbp register).

In "Intel-64 and IA-32 Arch. Software Developer's Manual", you will only find information about the basic instructions, which means "inc" but not its type, incl.

Could you quote the SAV chosen for MEM_LOAD_RETIRED.L1D_LINE_MISS.events?

Use the Precise Events to focus on instructions which makes high LI & L2 misses, also check which instructions is causing Branch mis-predictions.

If I happen to see your asm code, it seems you have compiled the applications without any optimization flags(-On), any reasons for doing so?

Could you try compiling your application with O3 or O2 and let the code use SSE stack rather x87 stack.

~BR
Mukkaysh Srivastav

Reply