Community support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
4894 Discussions

mispredicted branches retired


I am a bit confused about what the "mispredicted branch" event count means exactly since VTuneing my code gives the following numbers:

Branches retired: 7497
Mispredicted branches retired: 12386
Instructions retired: 2570

From that I would assume that "Branches retired" is not the total number of branches retired since I have more mispredicted branches than "total". The event help mentions some extra bits (MMxx) that can be set but I do not seem to be able to set them in VTune explicitly and VTune's default is not documented.

Instructions retired somehow must exclude branches, or I have a lot of branches that are "bogus" instructions, due to mispredicted branches.

A lot of the branch event counts are in non-branch code, e.g. 6000 mispredicted branches and 4000 branches are counted in the lines "inc" and "pad #2" below.

The code looks like this:

    lea   esp, DWORD PTR [esp]         ; padding
    movsx edx, WORD PTR [esi]
    mov   eax, DWORD PTR [esp+014h]
    add   DWORD PTR [eax+edx*4], 1
    add   DWORD PTR [esp+0ch], 1       ; inc
    lea   ecx, DWORD PTR [ecx]         ; pad #2
    lea   eax, DWORD PTR [edi+edi]
    add   ecx, edi
    add   ebx, edi
    add   esi, eax
    cmp   ecx, ebp
    jnge  blah

Incidentally, I get a lot of MOB replay events (about 24000, 7000 of which are in "pad #2" and 6000 in "inc" ) and the store forward performance impact is high (about 200).

The machine is a Dual CPU Nocona (Fam F, Mod 4, Step 1, Rev E0).

Thanks for any clues.

Message Edited by ipanema89 on 11-11-2005 03:44 AM

0 Kudos
5 Replies

There must be something basic I do not understand here. The numbers in my previous post were from the assembly view. If I go to the hotspot view, the numbers make more sense. These are from similar code, but a different run:

Branches retired: 7,284 x 129,962 = 946,643,208
Mispredicted branches retired: 12,742 x 2,034 = 25,917,228
Instructions retired: 2,391 x 3,600,000 = 8,607,600,000

Now the assembly view shows the numbers before the "x", but they seem to be meaningless without the multiplier. What are they?

0 Kudos

Yes, two things are happening here.

First, with calibration enabled, the sample-after-value is dynamically determined so that the analyzer collects approximately 1000 samples per second per processor. Thus, the number of samples (what you are seeing in the source view) does not mean the same thing across events.

Second, you can change what is displayed in the source view by right-clicking in an event column and selecting from the View Events As... menu. For example, you might want to select Total Events to get the numbers with the "x", as you put it, basically, sample times sample-after-value. You can also select from a couple of percentages.

Hope this helps,

0 Kudos

Thanks that helps.

Any clues as to why most of the branch and mispredict counts appear on statements without any branches anywhere near it?

Here is another extreme example of this:

    ...                             branches   mispred      loads
    and    ecx, eax                  264,374     2,238
    cmp    ecx, esi                              2,238
    jle    over                                  2,238
    mov    ecx, esi                              3,357
    add    eax, 5                    264,374    14,547
    cdq                              132,187    17,904
    mov    esi, 5
    idiv   esi
    mov    eax, DWORD PTR [esp+48] 4,890,919   718,398  5,774,373
    push   eax
    mov    eax, DWORD PTR [esp+58]
    push   ebx

Message Edited by ipanema89 on 11-16-2005 05:53 AM

0 Kudos

Yes. :smileywink:

This is what we call "event skid". You can read about it in the online help but, basically, it occurs because the processor can't stop fast enough to capture the actual instruction that caused the event overflow. However, usually, you can infer which specific instructions caused the events/samples. For example, in your case, thejle instruction is generating some of the events associated with the add and cdq instructions that follow it.

0 Kudos
Thanks. I knew about event skid, however the 5mill branch events are registered 7 (seven) instructions away from the jump instruction, and at the same time there are 6mill load events that are apparently unskidded. Is it likely that the event skid goes that far, or is there another reason for these?

Oh, and there is no jump to the other side of the code to explain the mispredicted branch events either, since the only thing there is an unconditional call which is more than 12 instructions away. Also there are a few mispredicted branch events further down (about 1000 each) which would be about 15 instructions away from the jle.

Has this something to do with code that is executed on the basis that a jump would occur but the results are thrown away because of the mispredicted branch?

Message Edited by ipanema89 on 11-18-2005 05:35 AM

0 Kudos