Software Archive
Read-only legacy content
17061 Discussions

icpc compiler flag -inline-forceinline destroys correctness of Intrinsics kernel

Patrick_S_
New Contributor I
607 Views

Hey all,

 

I have played a little bit with the compiler flags in the last days thereby I discovered that using inline functions involving the compiler flag -inline-forceinline can change the result of a program tremendously on KNC.

Let's say I have two quiet small omp parallelized Intrinsics kernels A( float* in, float* out ) and B( float* in, float* out), which I call four times with different in/out arrays:

[cpp]

A( x_in, x_out );

B( x_in, x_out );

A( y_in, y_out );

B( y_in, y_out );

[/cpp]

then the result x_out/y_out computed by A and B is wrong whether the code is compiled with -inline-forceinline. This stays true even when running with only one thread. The icpc compiler 14.0.2 20140120 reports no error.

Actually I don't need that flag, but I'm really interested in why that occurs. Is that a compiler bug or do I have to use this flag with caution? Besides floating-point precision flags I have never experienced such a behavior.

 

Thanks

Patrick

0 Kudos
5 Replies
Kevin_D_Intel
Employee
607 Views

Pardon the delayed reply. I am trying to locate some information about your observations.

0 Kudos
Kevin_D_Intel
Employee
607 Views

It is possible this may be a compiler defect and our developers asked whether you could provide a reproducing test case they could investigate further.

0 Kudos
Patrick_S_
New Contributor I
607 Views

Yes, I will try to break down my program to a simple example of that bug.

0 Kudos
Patrick_S_
New Contributor I
607 Views

Kevin,

 

At the moment I can't provide a simple example. But I have noticed something more general.

So far I tried to track down where my computation goes wrong if the code is compiled with -inline-forceinline. For that I wanted to follow the "register data stream" with std::cout. The functions A and B have both a big for loop and when I try to print out the result of each function for iteration 0 with

[cpp]

if ( itr == 0 ) {

 

     std::cout << result[itr] << std::endl;

}

[/cpp]

I should see 4 values in my terminal (which is true if the code is compiled without -inline-forceinline). If the code is compiled with -inline-forceinline then the terminal gets spammed with a lot of output. The code is running with 240 threads in my tests.

Do you have an idea how I can find the reason for that? 

Something is definitely broken in the compiler. The behavior of a program should not change whether it is compiled with -inline-forceinline or not.

0 Kudos
Patrick_S_
New Contributor I
607 Views

Ok, I think it is a problem with openMP. Each thread is executing all loop iterations instead of a chunk if the code is compiled with -inline-forceinline. The result is only printed out once if the program runs with 1 thread.

0 Kudos
Reply