I am a bit confused about what the "mispredicted branch" event count means exactly since VTuneing my code gives the following numbers:
Branches retired: 7497 Mispredicted branches retired: 12386 Instructions retired: 2570
From that I would assume that "Branches retired" is not the total number of branches retired since I have more mispredicted branches than "total". The event help mentions some extra bits (MMxx) that can be set but I do not seem to be able to set them in VTune explicitly and VTune's default is not documented.
Instructions retired somehow must exclude branches, or I have a lot of branches that are "bogus" instructions, due to mispredicted branches.
A lot of the branch event counts are in non-branch code, e.g. 6000 mispredicted branches and 4000 branches are counted in the lines "inc" and "pad #2" below.
The code looks like this:
lea esp, DWORD PTR [esp] ; padding blah: movsx edx, WORD PTR [esi] mov eax, DWORD PTR [esp+014h] add DWORD PTR [eax+edx*4], 1 add DWORD PTR [esp+0ch], 1 ; inc lea ecx, DWORD PTR [ecx] ; pad #2 lea eax, DWORD PTR [edi+edi] add ecx, edi add ebx, edi add esi, eax cmp ecx, ebp jnge blah
Incidentally, I get a lot of MOB replay events (about 24000, 7000 of which are in "pad #2" and 6000 in "inc" ) and the store forward performance impact is high (about 200).
The machine is a Dual CPU Nocona (Fam F, Mod 4, Step 1, Rev E0).
Thanks for any clues.
Message Edited by ipanema89 on 11-11-2005 03:44 AM
There must be something basic I do not understand here. The numbers in my previous post were from the assembly view. If I go to the hotspot view, the numbers make more sense. These are from similar code, but a different run:
Branches retired: 7,284 x 129,962 = 946,643,208 Mispredicted branches retired: 12,742 x 2,034 = 25,917,228 Instructions retired: 2,391 x 3,600,000 = 8,607,600,000
Now the assembly view shows the numbers before the "x", but they seem to be meaningless without the multiplier. What are they?
Yes, two things are happening here.
First, with calibration enabled, the sample-after-value is dynamically determined so that the analyzer collects approximately 1000 samples per second per processor. Thus, the number of samples (what you are seeing in the source view) does not mean the same thing across events.
Second, you can change what is displayed in the source view by right-clicking in an event column and selecting from the View Events As... menu. For example, you might want to select Total Events to get the numbers with the "x", as you put it, basically, sample times sample-after-value. You can also select from a couple of percentages.
Hope this helps,
Thanks that helps.
Any clues as to why most of the branch and mispredict counts appear on statements without any branches anywhere near it?
Here is another extreme example of this:
... branches mispred loads and ecx, eax 264,374 2,238 cmp ecx, esi 2,238 jle over 2,238 mov ecx, esi 3,357 over: add eax, 5 264,374 14,547 cdq 132,187 17,904 mov esi, 5 idiv esi mov eax, DWORD PTR [esp+48] 4,890,919 718,398 5,774,373 push eax mov eax, DWORD PTR [esp+58] push ebx ...
Message Edited by ipanema89 on 11-16-2005 05:53 AM
This is what we call "event skid". You can read about it in the online help but, basically, it occurs because the processor can't stop fast enough to capture the actual instruction that caused the event overflow. However, usually, you can infer which specific instructions caused the events/samples. For example, in your case, thejle instruction is generating some of the events associated with the add and cdq instructions that follow it.
Message Edited by ipanema89 on 11-18-2005 05:35 AM