Dynamic branch prediction info

Altera_Forum · ‎12-22-2009

Is there any information about how the dynamic branch prediction of the Nios II/f works?

We are looking to reduce the worst-case paths through some relatively short functions (less than 1k of code in total, compiled as a single function) and would rather have the static branch prediction of the II/s - but need the II/f to avoid all the pipeline stalls during memory transfers.

I need to work out how to stop the branch predictor mispredicting branches.

Consider, for example, a C 'if' statement with a lot of || cases. The worst case is the fallthrough - so you need to 'predict - not taken' all the branches, and take the 'hit' of one mis-predicted branch when acondition is true.

The dynamic predictor might remember that several of the branches were taken the previous time they were executed - so getting into the conditional code may take several mispredicted branches.

Even the idle loop (repeatedly reading a hardware register) has a branch that needs to be optimised for the unusual case!

Altera_Forum · ‎12-22-2009

I assume you've gotten all you can out of this:

http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf

What behavior are you seeing? You would hope that the branch predictor might get the first case wrong, and the taken case wrong but all the cases between correct.

Are you striving for minimum execution time or predictability?

Can you post the code for the statement that's causing you grief?

Jake

Altera_Forum · ‎12-23-2009

The only text about the branch predictor is on page 5-10 (just below table 5-8) "Dynamic branch prediction is implemented using a 2-bit branch history table". This doesn't give very much information!

Code wise, we need to minimise the worst case code paths, not the most common code paths.

Consider the following test (edited a little) which is in the longest code path.

 # define et(x) (__builtin_expect((x),1))
    if (et(x->p == 17)
            && et(x->val - HI < LIM)
            && et((x->hi << 16 | x->lo) == cfg->v)
            && et((x->ver & 0xff) == VER)
            && et((x->flg & 0x3fff) == 0)) {
        /* code to do something */
    }

Each time through it is likely to take different exit.

With static branch prediction there is at most 1 mispredicted branch, and that is never in the worst case path, all the other branches are 'fall through'.

However the dynamic prediction could easily get all 5 wrong! (and there are 4 more conditionals before this one!).

This all adds a significant cost to the code paths.

Ideally I'd like to build the II/f cpu without the branch predictor.

Altera_Forum · ‎04-22-2010

Ask Altera support for instructions on how to access the 'hidden' cpu configuration screen.