Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

OpenMP collapse clause still not working

felix-rubio-dalmau
757 Views
Hello,
we have found an issue with ICC version 12.x regarding OpenMP parallel for with collapse clause. When compiling with -O2 or -O3 the results are not those we are expecting. However, the code works fine with GCC -O3, and also ICC with -O0. On the other hand, we have found that hard coding the limits of the outer loop with ICC -O3 (instead of setting the limits on variables) drives us to a segmentation fault. We don't know where to report this bug with the compiler. We have already found another thread reporting a similar problem with fortran compiler (http://software.intel.com/en-us/forums/showthread.php?t=83135&wapkw=unexpected+behavior+for+openmp+collapse+clause).

regards
0 Kudos
3 Replies
TimP
Honored Contributor III
757 Views
For a segmentation fault, you should check your user stack limit setting, and, for a large case, your thread stack limit (default 2MB for 32-bit, 4MB for 64-bit, adjustable according to KMP_STACKSIZE).
The best way to report this is by filing a problem report with (if possible) a small reproducer on premier.intel.com. Registering your license automatically creates your support account. If you didn't register it, you can do so at https://registrationcenter.intel.com.
I have a stale bug report in about failure of collapse. Recently, one of my customers had some success with collapse, even in a case where the outer loop count isn't fixed at compile time and there is a vectorizable dot product inside the 2 outer collapsed loops.
0 Kudos
Sukruth_H_Intel
Employee
757 Views
Hi,
Yes it would be better if you could provide us some testcase which can emulate the error/issue. Also could you please check if using "-no-vec" can solve the problem with using O2 or O3 with icc.

ex:-icc -O2 simple.c -no-vec

Thanks & Regards,
Sukruth H.V
0 Kudos
TimP
Honored Contributor III
757 Views
There may be an implicit assumption here that your case auto-vectorizes with icc but that you didn't attempt vectorization with gcc.
In the case of 2 nested loops with the inner one vectorizable, usefulness of collapse might be unusual, and the compiler might encounter difficulty.
Needless to say, it's important for collapse to work in the case of 2 outer collapsible loops and a 3rd inner vectorized loop. There is a possibility of difficulty with a total of 3 nested loops, if the compiler attempts unroll-and-jam on the 2 inner loops when you request collapse on the 2 outer loops. I don't expect icc to attempt that, but I don't know how to control it.
0 Kudos
Reply