I have a very simple code:
I first compiled this code with ifort -O3 -xCORE-AVX2 -mcmodel=medium and the result was just the way I expected it to be, three columns of 100, 200 and 300s.
I then compile the exactly same code with ifort -O3 -xCORE-AVX2 -mcmodel=medium -qopenmp (OpenMP turned on). I run the code and the result is just totally wrong.
Why turning openmp flag alters the result even though there's no OpenMP directives in the code? I'm using Intel Parallel Studio XE Cluster Edition for Linux, version ifort 2018.5.274. Any idea?
I think your program allocates really huge arrays and is potentially violating some stack size limits. Compiling with -qopenmp links in the OpenMP library and triggers a segmentation fault.
Thank you for your reply. I also think it is related to stack size overflow. But shouldn't the "-mcmodel=medium" option suppose to circumvent this kind of problem?
I guess you read this thread https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/268374 ?
I cannot see what happens when you also use the -qopenmp flag, but when you look at the binary that is being created that in addition the libiomp5.so library is linked into it. As you said the the -mcmodel=medium flag puts no memory restriction on the data, so I don't see what is going on. nagfor can compile the code without any flags, with and without the openmp flag, and the executable produces the expected output.
I found that turning off optimization (-O0) made the code work normally, even with the -qopenmp flag on. It also worked with -O1 but above -O2 produced wrong results (-qopenmp on). I suspect something complicated going on with vectorization coupled with openmp but I lack of any knowledge analyzing assembly/binary codes and cannot proceed any further.
What is likely happening is you exposed a bug in openmp optimization conflicting with your array organization of using index order (MBODY,4).. The problem (my assumption) is your index order is generating code using strided stores (as opposed to contiguous stores) and this is inducing an unintended race condition. You can experiment with swapping the index order to that traditionally used by Fortran:
PROGRAM MAIN INTEGER*8 MBODY PARAMETER (MBODY=50000000) INTEGER*8, ALLOCATABLE :: A(:,:),B(:,:),C(:,:) ! 4,MBODY INTEGER*8 :: I ALLOCATE(A(4,MBODY),B(4,MBODY),C(4,MBODY)) ! (4properties,Mbodies) A=100 B=200 C=300 OPEN(10,FILE='test.dat') DO I=1,500 WRITE(10,'(3I10)') A(4,I),B(4,I),C(4,I) ! (propertyIndex, BodyIndex) ENDDO CLOSE(10) STOP END PROGRAM MAIN
Note, use of allocatable and allocation avoides stack capacity issues.
I appreciate your reply, jimdempseyatthecove. The code now works by changing the index order to (4,MBODY)!! Would you mind briefly explaining what the "traditionally used" index order in Fortran please? I know that Fortran stores array in column major order, but how does ordering the array (MBODY,4) triggers this problem?
The cause may be due to stack limitation and the swapping of the indices may have had nothing to do with the fix. Meaning the allocatable fixed the issue. Try keeping allocatables and swapping the indices back to original way to see if this corrects the problem of incorrect data.
If this corrects the problem, then the issue is likely due to a stack overflow issue that presents itself as data corruption as opposed to program crash.
If the problem (incorrect data) still occurs, then it is a code generation issue. Examination of the assembly could would show what was happening. I can only provide a speculation based on symptoms observed. Regardless of speculation, the behavior is a bug. The triggering is likely caused by the -openmp (in release build) merging the implied loops in an incorrect manner (IOW bug)
By the way. In your original post you stated the "results were totally wrong".
Did you mean the values were junk data or misplaced 100's, 200's and 300's.
Dynamically allocating the arrays solved the problem regardless of the order of the indices. (MBODY,4) and (4,MBODY) both worked well with ALLOCATABLE. But statically allocating the arrays in (4,MBODY) order also corrected the problem (at least with the current MBODY value) which I find it very interesting.
What I meant by the "wrong result" was that when the written file is opened, only 0s are written to arrays A and B. Array C is written fine.
The problem may be due to the static data being larger than 2GB. MBODY = 50 million, * 4 * 8 * 3 = 4.8GB
You state you are using -mcmodel=medium, document states this should be correct as long as code is .lt. 2GB. Try -mcmodel=large with the static data.
Note, IIF (If and only If) -mcmodel=medium generates linker ELF code using 32-bit headers, then an individual segment (e.g. .data or .bss) would be restricted to 32-bits in size.
The compiler should produce an error message under the circumstance of a linker segment exceeding the segment size restrictions.
I haven't done this, but this may work to keep -mcmodel=medium .AND. static data
INTEGER*8 MBODY PARAMETER (MBODY=50000000) INTEGER*8 A(MBODY,4),B(MBODY,4),C(MBODY,4) ! each 1.6GB COMMON /A_COM/ A !DIR$ PSECT /A_COM/ ALIGN=OCTA,WRT COMMON /B_COM/ B !DIR$ PSECT /B_COM/ ALIGN=OCTA,WRT COMMON /C_COM/ C !DIR$ PSECT /C_COM/ ALIGN=OCTA,WRT
I realized that I have been running the program with increased stack size and virtual memory (ulimit -s unlimited
ulimit -v unlimited). If I run the code with default stack size for linux (8192 kbyes), segmentation fault kicks in when the openmp flag is activated. Setting the stack size to unlimited (ulimit -s unlimited) seems to make the code running but produces incorrect results that I have been talking about in this thread.
-mcmodel=large doesn't make any difference. But the provided code does work well without any errors and the result is correct.