I'm trying to port some code to OpenMP, and have encountered a problem. I've distilled out a minimal failing example below
program test_collapse parameter(N1=8,N2=8,N3=4) real*8 R(N1,N2,N3) c Set up an array do k=1,N3 do j=1,N2 do i=1,N1 R(i,j,k)=rand(0) enddo enddo enddo c Now do some work on it !$omp parallel do collapse(2) schedule(dynamic) private(i) do k=N3-1,N3 do j=1,N2 do i=1,N1 R(i,j,k)=R(i,j,k) + 1 enddo c Do some more work with this inner section of array in another loop here c so collapse(3) won't work enddo enddo c I would then do something else here to deal with the remainder of the array !$omp end parallel do c stop end
I'm compiling with
ifort -O0 -g -qopenmp -Wall -o ifort_test ifort_test.F
I see no reason this shouldn't work, and it works in gfortran, but with ifort (even with OMP_NUM_THREADS=1), it segfaults in the second loop. Changing the k loop to go from 1 to N3 allows the program to exit normally.
I encounter this on both a KNL machine and a Kaby Lake machine, using ifort 2017 and 2018.
Has anyone else encountered this or know how it can be worked around? Is Intel aware of this/planning to fix it in a future version?
The collapse(2) should make the first two loop control variables (k, j) implicitly private, but should not affect the third loop control variable i, which conceivably could be (defaultly) shared (though compiler optimizations register-izing i may effectively make it private).
IOW add "private(i)" clause.
Thanks for your reply.
I've edited my initial post—previously it said private(ithird), which was a remnant of the old variable names that I edited for clarity in forum post version. Now it says private(i), which is equivalent to what was in my original code. I.e. even with the correct private clause, it still fails.
Conspicuously, if the collapse(2) directive is removed, or expanded to collapse(3), then the code also exits cleanly; it's only in the partially collapsed case that there's a problem. Similarly, if the outer loop starts at 1 rather than 3, then all works fine.
I've stepped through the assembly in GDB; the failing instruction is
if that helps at all.
Hopefully your posting (inclusive) of the edit to private(i) is sufficient to produce a failing reproducer at Intel.
The above instruction is a scalar move of a double precision variable. It is unusual for it to fail unless rax was not computed properly.