Community
cancel
Showing results for 
Search instead for 
Did you mean: 
max-divulskiy
Beginner
90 Views

Paradox in cpu_specific

Good afternoon.
I have a question concerning use cpu_specific function call. Why the compiler does not make function embedding in a code in this case:
[bash]#pragma region CountOnBits UINT8
__declspec(cpu_specific(generic))
UINT __forceinline CountOnBits( CUINT8 Data )
{
return OnBitsArray[Data];
}

__declspec(cpu_specific(core_i7_sse4_2))
UINT __forceinline CountOnBits( CUINT8 Data )
{
return _mm_popcnt_u32( Data );
}

__declspec(cpu_dispatch(generic, core_i7_sse4_2))
UINT __forceinline CountOnBits( CUINT8 Data )
{
// Empty function body informs the compiler to generate the
// CPU-dispatch function listed in the cpu_dispatch clause.
}[/bash]
but makes embedding in case:
[bash]UINT __forceinline CountOnBits_Simple(UINT8 Data)
{
return OnBitsArray[Data];
}[/bash]

Is it possible achieved the identical results?
0 Kudos
2 Replies
Thomas_W_Intel
Employee
90 Views

The compiler does not inline ("embed") the function becausebothvariants are compiled, and at start-up of the program it is decided, which function is used depending on the platform.Such amechanism is impossible, if the function is inlined.

In case of such a short function like CountOnBits_Simple, it is understandable that you don't want to pay the penalty of an extra function call. You therefore have two alternatives:
1. Generate different binaries for different platforms, i.e. use #ifdefs to compile the 2 different versions.
2. Use the dispatching mechanism on a higher level in your call stack. By this I mean that you use cpu_dispatch for the function that calls CountOnBits_Simple. Assuming that this function is normally not inlined, you won't introduce an additional call overhead.

Kind regards
Thomas

max-divulskiy
Beginner
90 Views

Thanks you Thomas.
I will use the second variant.
Reply