there were some discussions two years ago about performance degradation when using "-qopenmp" and "-heap-arrays":
I am wondering wether that has been finally fixed??
I have recently noticed a performance degradation when using omp accelerated code and "-heap-arrays" up to that the threaded code is eventually slower than the unthreaded version. This was with compiler 17.07. When "-heap-arrays" was excluded from the compiler options, the speed of threaded version was considerably faster than the non-threaded version. I found this on a machine with an Intel(R) Xeon(R) CPU E5-2697 v4 working on arrays which require ram > 70GB.
Can anybody, possible from intel, comment on this?! If required I can post a test program which exhibits that behavior.
Heap allocation from within a parallel region typically involves a passage through a critical section. This will serialize the allocations. (Note, newer versions of Intel OpenMP have been integrating the TBB parallel allocator, therefore, when used, small-ish allocations do not pass through a critical section but the large/huge may encounter critical sections).
What may be helpful in situations were (when) your parallel region calls nested subroutine levels, that if you can identify the specific subroutines that have small-ish allocations, that you compile those sources with the local arrays specified as auto allocated (stack), and for those with large/huge allocations that those be compiled with heap arrays.
thank you very much for this explanation.
I went back to the code and found that the compiler was creating a temporary array in the parallel region over and over again. Fixing it brought the "-heap-array" version back to expected speed.