- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey all,
I have played a little bit with the compiler flags in the last days thereby I discovered that using inline functions involving the compiler flag -inline-forceinline can change the result of a program tremendously on KNC.
Let's say I have two quiet small omp parallelized Intrinsics kernels A( float* in, float* out ) and B( float* in, float* out), which I call four times with different in/out arrays:
[cpp]
A( x_in, x_out );
B( x_in, x_out );
A( y_in, y_out );
B( y_in, y_out );
[/cpp]
then the result x_out/y_out computed by A and B is wrong whether the code is compiled with -inline-forceinline. This stays true even when running with only one thread. The icpc compiler 14.0.2 20140120 reports no error.
Actually I don't need that flag, but I'm really interested in why that occurs. Is that a compiler bug or do I have to use this flag with caution? Besides floating-point precision flags I have never experienced such a behavior.
Thanks
Patrick
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pardon the delayed reply. I am trying to locate some information about your observations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is possible this may be a compiler defect and our developers asked whether you could provide a reproducing test case they could investigate further.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I will try to break down my program to a simple example of that bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kevin,
At the moment I can't provide a simple example. But I have noticed something more general.
So far I tried to track down where my computation goes wrong if the code is compiled with -inline-forceinline. For that I wanted to follow the "register data stream" with std::cout. The functions A and B have both a big for loop and when I try to print out the result of each function for iteration 0 with
[cpp]
if ( itr == 0 ) {
std::cout << result[itr] << std::endl;
}
[/cpp]
I should see 4 values in my terminal (which is true if the code is compiled without -inline-forceinline). If the code is compiled with -inline-forceinline then the terminal gets spammed with a lot of output. The code is running with 240 threads in my tests.
Do you have an idea how I can find the reason for that?
Something is definitely broken in the compiler. The behavior of a program should not change whether it is compiled with -inline-forceinline or not.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, I think it is a problem with openMP. Each thread is executing all loop iterations instead of a chunk if the code is compiled with -inline-forceinline. The result is only printed out once if the program runs with 1 thread.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page