Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Marián__VooDooMan__M
New Contributor II
179 Views

implement compile-time if() branch "prediction" / order of if(true)/else branches

Greetings,

It would be really nice to have in MSVC + ICC GCC-like style of forced if/else branch order in code, like:

[cpp]// x.h

#if defined(__GNUC__)
# define my_likely(x) __builtin_expect(!!(x),1)
# define my_unlikely(x) __builtin_expect(!!(x),0)
#else
# define my_likely(x) (x)
# define my_unlikely(x) (x)
#endif

// x.cpp if(my_likely(x)) { // x == true ... do something ... } else { // x == false, unlikely branch of code (most of the time) ... do something ... } if(my_unlikely(x)) { // x == true, though unlikely branch of code ... do something ... } else { // x == false, prefer this branch as first one in the final assembler code, since the "x" likely to be evaluated as "false" ... do something ... }[/cpp]
by this, under MSVC + ICC the programmer have a choice to prefer other branch of code as "likely" or "unlikely" which could add a little speed-up of final code. I guess instruction cache lines are not too large and also I guess hardware branch prediction is not always successful, in case the programmer is really sure the branch is really un/likely most of the time (whether by case of logic issues or decision of performance profiling analysis).

consider code from wxWidgets, and their own implementation of "assert":

[bash]#define wxASSERT_MSG(cond, msg) if ( !wxTheAssertHandler || (cond) ) {} else wxOnAssert(__FILE__, __LINE__, __WXFUNCTION__, #cond, msg)[/bash]
I am using my own "assert" implementation wrapper, and I use "if((x)) {} else { ...do fatal error... }" like wxWidgets does. Here, evaluation of "x" is most likely to be "true". ICC might have its own software-based branch prediction, I guess, and combined with HW-based one is sexy, but programmer sometimes is really (almost) sure about branch order.
0 Kudos
10 Replies
SergeyKostrov
Valued Contributor II
179 Views

Hi,

It's a good idea and I look forward to hearing responses from Intel Software / Hardware engineers. In the
most time criticalcases Itry toavoid 'if-else' statements but it is simply impossible to avoid them
completely.

I also would like to make a note regarding asserts. A classic ASSERT is a macro and has"empty
implementation" in a Release configuration.It doesn't do anything and it meansthaterrors are "hidden"!

I use adifferent approach and this is how it looks like:

#ifdef _RTDEBUG

#define _RTASSERT( _expression, _source ) \
{ \
CrtAssert( ( _expression ) ); \
} \

#else

#define _RTASSERT( _expression, _source ) \
{ \
if( !( _expression ) ) \
{ \
CrtPrintf( RTU("Release - Hidden Assert Detected! - %s"),\
_source ); \
} \
} \

#endif

Marián__VooDooMan__M
New Contributor II
179 Views

Hello,

I am very sorry, but I feel, Sergey Kostrov, you have missed my point.

I have mentioned an "assert" example case as an exmaple for many other cases.

In my code I use extensively mentioned cases:

[cpp]if(my_likely(x)) { ... } else {
...
}[/cpp] and
[bash]if(my_unlikely(x)) { ... } else {
...
}[/bash]
but ONLY when I am sure that in most cases which branch will take a place to execute, since I really know in particular cases which if-else branch is likely to be executed (like it is used in Linux Kernel, while compiled with GCC). I am missing this feature in MSVC + ICC version. As I am looking at generated *.cod files, I could by these specifiers get slightly better performance, as I could give the compiler a hint.
OTorg
New Contributor II
179 Views

Agree + one more wish ofhints: to be possible to mark variable as "random value" (predictions will be right in a 50% of cases).For such variables complier should try get rid of brunches at all (where it would be rational). Example:
[cpp]if ( cond & 0x1000 ) x = x << 1 | x >> 3 & 1; [/cpp] It is a conditional rotate left of4-bit number, unbruch it:
[cpp]DWORD cond2 = ( cond & 0x1000 ) >> 12; x = x << cond2 | x >> 3 & cond2; [/cpp]
Second code will be noticeably faster if "cond"is a random data (eg: input from sensor or media stream from satellite).
styc
Beginner
179 Views

Seemingly ICC already supports GCC's __builtin_expect. PGO can make up for the gap, too.
styc
Beginner
179 Views

ICC does even better than turning cond into a shift count---it uses a conditional move. But ICC does not always perform the optimization. Compare the two cases of the #if directive:[cpp]#include #include int main() { int x, cond; #if 1 x = rand(); cond = rand(); #else scanf("%d%d", &x, &cond); #endif if (cond & 0x1000) x = x << 1 | x >> 3 & 1; printf("%dn", x); }[/cpp]Depending on whether x gets allocated register or memory storage, ICC converts the if-test in the top case but not in the bottom case. The latter seems to be a missed optimization.
Marián__VooDooMan__M
New Contributor II
179 Views

Quoting styc
Seemingly ICC already supports GCC's __builtin_expect. PGO can make up for the gap, too.

Oh, mea culpa, you are right.

my code:

[bash]#if defined(__GNUC__)
# define my_likely(x) __builtin_expect(!!(x),1)
# define my_unlikely(x) __builtin_expect(!!(x),0)[/bash]
was from times when ICC did not support __builtin_expect under MSVC yet. I just changed it to:

[bash]#if defined(__GNUC__) || defined(__INTEL_COMPILER)
# define my_likely(x) __builtin_expect(!!(x),1)
# define my_unlikely(x) __builtin_expect(!!(x),0)[/bash]
and it is working well.

Thank you!
OTorg
New Contributor II
179 Views

Quoting styc
[cpp]if ( cond & 0x1000 ) x = x << 1 | x >> 3 & 1; DWORD cond2 = ( cond & 0x1000 ) >> 12; x = x << cond2 | x >> 3 & cond2; [/cpp]
ICC does even better than turning cond into a shift count---it uses a conditional move. But ICC does not always perform the optimization.

I agree that I have given a very simple example, the compiler can do unbrunching even without forced guidance.
But there are cases with a heavier code when it is necessary to add a dozen or so instructions for unbrunching. And here we need a hint to the compiler: "This is the random data, compare, please unbrunching overhead and cost of 50%-incorrect predictions, and make optimal code."

Marián__VooDooMan__M
New Contributor II
179 Views

Quoting dj_alek
Quoting styc
  1. if(cond&0x1000)
  2. x=x<<1|x>>3&1;
  3. DWORDcond2=(cond&0x1000)>>12;
  4. x=x<>3&cond2;
[cpp]if ( cond & 0x1000 ) x = x << 1 | x >> 3 & 1; DWORD cond2 = ( cond & 0x1000 ) >> 12; x = x << cond2 | x >> 3 & cond2; [/cpp]
ICC does even better than turning cond into a shift count---it uses a conditional move. But ICC does not always perform the optimization.

I agree that I have given a very simple example, the compiler can do unbrunching even without forced guidance.
But there are cases with a heavier code when it is necessary to add a dozen or so instructions for unbrunching. And here we need a hint to the compiler: "This is the random data, compare, please unbrunching overhead and cost of 50%-incorrect predictions, and make optimal code."

I guess you mean that compiler should (by explicit hint from programmer) choose random branch order, and distribute this randomness like every even chose first branch and every odd choose the second one? So if I understand you correctly, it is really nice idea. I am working on signal processing SW, where audio signal on input can be really random (like from the microphone), and based on this signal, the decision to further calculations is made.
OTorg
New Contributor II
179 Views

Quoting VooDooMan
Quoting dj_alek
I guess you mean that compiler should (by explicit hint from programmer) choose random branch order, and distribute this randomness like every even chose first branch and every odd choose the second one? So if I understand you correctly, it is really nice idea. I am working on signal processing SW, where audio signal on input can be really random (like from the microphone), and based on this signal, the decision to further calculations is made.

No. Compiler should avoid brunches at all (make unbrunching in other words).Look into the sample code:

[cpp]DWORD cond2 = ( cond & 0x1000 ) >> 12; x = x << cond2 | x >> 3 & cond2;[/cpp]There is not any "if"word - brunch is absent.This will lead to the execution of excessive instructions compared to brunched code, butthe average performance will be higher. In details:[cpp]if ( q() ) a(); else b(); [/cpp] Assume a() takes 10 cycles, b() takes 15 cycles, q() takes 5 cycles. When q() is well-predictable, then averageexecution time is 18 cycles.But when q() is a random value, then weget ~50% of misspredictions.Suppose that wrongprediction costs 20 cycles. So, we have average execution time of 28 cycles.

In many cases we can replace "if q then a else b" with code that doesn't contain "if". If such replacement costs less than 28 cycles, then wehave performance benefit.

OTorg
New Contributor II
179 Views

Quoting VooDooMan
I am working on signal processing SW, where audio signal on input can be really random (like from the microphone), and based on this signal, the decision to further calculations is made.

I recommend you these books:
1. The Software Optimization Cookbook,www.intel.com/intelpress/sum_soc.htm
Reply