Community
cancel
Showing results for 
Search instead for 
Did you mean: 
66 Views

OpenMP collapse clause still not working

Hello,
we have found an issue with ICC version 12.x regarding OpenMP parallel for with collapse clause. When compiling with -O2 or -O3 the results are not those we are expecting. However, the code works fine with GCC -O3, and also ICC with -O0. On the other hand, we have found that hard coding the limits of the outer loop with ICC -O3 (instead of setting the limits on variables) drives us to a segmentation fault. We don't know where to report this bug with the compiler. We have already found another thread reporting a similar problem with fortran compiler (http://software.intel.com/en-us/forums/showthread.php?t=83135&wapkw=unexpected+behavior+for+openmp+collapse+clause).

regards
0 Kudos
3 Replies
TimP
Black Belt
66 Views

For a segmentation fault, you should check your user stack limit setting, and, for a large case, your thread stack limit (default 2MB for 32-bit, 4MB for 64-bit, adjustable according to KMP_STACKSIZE).
The best way to report this is by filing a problem report with (if possible) a small reproducer on premier.intel.com. Registering your license automatically creates your support account. If you didn't register it, you can do so at https://registrationcenter.intel.com.
I have a stale bug report in about failure of collapse. Recently, one of my customers had some success with collapse, even in a case where the outer loop count isn't fixed at compile time and there is a vectorizable dot product inside the 2 outer collapsed loops.
Sukruth_H_Intel
Employee
66 Views

Hi,
Yes it would be better if you could provide us some testcase which can emulate the error/issue. Also could you please check if using "-no-vec" can solve the problem with using O2 or O3 with icc.

ex:-icc -O2 simple.c -no-vec

Thanks & Regards,
Sukruth H.V
TimP
Black Belt
66 Views

There may be an implicit assumption here that your case auto-vectorizes with icc but that you didn't attempt vectorization with gcc.
In the case of 2 nested loops with the inner one vectorizable, usefulness of collapse might be unusual, and the compiler might encounter difficulty.
Needless to say, it's important for collapse to work in the case of 2 outer collapsible loops and a 3rd inner vectorized loop. There is a possibility of difficulty with a total of 3 nested loops, if the compiler attempts unroll-and-jam on the 2 inner loops when you request collapse on the 2 outer loops. I don't expect icc to attempt that, but I don't know how to control it.