Weird -qopenmp behavior

yoon__Young-Jin · ‎01-16-2019

I have a very simple code:

PROGRAM MAIN

   INTEGER*8 MBODY
   PARAMETER (MBODY=50000000)

   INTEGER*8 A(MBODY,4),B(MBODY,4),C(MBODY,4)
   INTEGER*8 I

   A=100
   B=200
   C=300

   OPEN(10,FILE='test.dat')

   DO I=1,500
   WRITE(10,'(3I10)') A(I,4),B(I,4),C(I,4)
   ENDDO

   CLOSE(10)

   STOP
   END

I first compiled this code with ifort -O3 -xCORE-AVX2 -mcmodel=medium and the result was just the way I expected it to be, three columns of 100, 200 and 300s.

I then compile the exactly same code with ifort -O3 -xCORE-AVX2 -mcmodel=medium -qopenmp (OpenMP turned on). I run the code and the result is just totally wrong.

Why turning openmp flag alters the result even though there's no OpenMP directives in the code? I'm using Intel Parallel Studio XE Cluster Edition for Linux, version ifort 2018.5.274. Any idea?

Juergen_R_R · ‎01-17-2019

I think your program allocates really huge arrays and is potentially violating some stack size limits. Compiling with -qopenmp links in the OpenMP library and triggers a segmentation fault.

yoon__Young-Jin · ‎01-17-2019

Thank you for your reply. I also think it is related to stack size overflow. But shouldn't the "-mcmodel=medium" option suppose to circumvent this kind of problem?

Juergen_R_R · ‎01-17-2019

I guess you read this thread https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/268374 ?

I cannot see what happens when you also use the -qopenmp flag, but when you look at the binary that is being created that in addition the libiomp5.so library is linked into it. As you said the the -mcmodel=medium flag puts no memory restriction on the data, so I don't see what is going on. nagfor can compile the code without any flags, with and without the openmp flag, and the executable produces the expected output.

yoon__Young-Jin · ‎01-18-2019

I found that turning off optimization (-O0) made the code work normally, even with the -qopenmp flag on. It also worked with -O1 but above -O2 produced wrong results (-qopenmp on). I suspect something complicated going on with vectorization coupled with openmp but I lack of any knowledge analyzing assembly/binary codes and cannot proceed any further.

jimdempseyatthecove · ‎01-18-2019

What is likely happening is you exposed a bug in openmp optimization conflicting with your array organization of using index order (MBODY,4).. The problem (my assumption) is your index order is generating code using strided stores (as opposed to contiguous stores) and this is inducing an unintended race condition. You can experiment with swapping the index order to that traditionally used by Fortran:

PROGRAM MAIN
  INTEGER*8 MBODY
  PARAMETER (MBODY=50000000) 
      
  INTEGER*8, ALLOCATABLE :: A(:,:),B(:,:),C(:,:) ! 4,MBODY
  INTEGER*8 :: I

  ALLOCATE(A(4,MBODY),B(4,MBODY),C(4,MBODY)) ! (4properties,Mbodies)
  A=100
  B=200
  C=300
      
  OPEN(10,FILE='test.dat')
      
  DO I=1,500
    WRITE(10,'(3I10)') A(4,I),B(4,I),C(4,I) ! (propertyIndex, BodyIndex)
  ENDDO
      
  CLOSE(10)
      
  STOP
END PROGRAM MAIN

Note, use of allocatable and allocation avoides stack capacity issues.

Jim Dempsey

yoon__Young-Jin · ‎01-18-2019

I appreciate your reply, jimdempseyatthecove. The code now works by changing the index order to (4,MBODY)!! Would you mind briefly explaining what the "traditionally used" index order in Fortran please? I know that Fortran stores array in column major order, but how does ordering the array (MBODY,4) triggers this problem?

jimdempseyatthecove · ‎01-18-2019

The cause may be due to stack limitation and the swapping of the indices may have had nothing to do with the fix. Meaning the allocatable fixed the issue. Try keeping allocatables and swapping the indices back to original way to see if this corrects the problem of incorrect data.

If this corrects the problem, then the issue is likely due to a stack overflow issue that presents itself as data corruption as opposed to program crash.

If the problem (incorrect data) still occurs, then it is a code generation issue. Examination of the assembly could would show what was happening. I can only provide a speculation based on symptoms observed. Regardless of speculation, the behavior is a bug. The triggering is likely caused by the -openmp (in release build) merging the implied loops in an incorrect manner (IOW bug)

By the way. In your original post you stated the "results were totally wrong".

Did you mean the values were junk data or misplaced 100's, 200's and 300's.

Jim Dempsey

yoon__Young-Jin · ‎01-19-2019

Dynamically allocating the arrays solved the problem regardless of the order of the indices. (MBODY,4) and (4,MBODY) both worked well with ALLOCATABLE. But statically allocating the arrays in (4,MBODY) order also corrected the problem (at least with the current MBODY value) which I find it very interesting.

What I meant by the "wrong result" was that when the written file is opened, only 0s are written to arrays A and B. Array C is written fine.

jimdempseyatthecove · ‎01-19-2019

The problem may be due to the static data being larger than 2GB. MBODY = 50 million, * 4 * 8 * 3 = 4.8GB

You state you are using -mcmodel=medium, document states this should be correct as long as code is .lt. 2GB. Try -mcmodel=large with the static data.

Note, IIF (If and only If) -mcmodel=medium generates linker ELF code using 32-bit headers, then an individual segment (e.g. .data or .bss) would be restricted to 32-bits in size.

The compiler should produce an error message under the circumstance of a linker segment exceeding the segment size restrictions.

Jim Dempsey

jimdempseyatthecove · ‎01-19-2019

I haven't done this, but this may work to keep -mcmodel=medium .AND. static data

    INTEGER*8 MBODY
    PARAMETER (MBODY=50000000) 
      
    INTEGER*8 A(MBODY,4),B(MBODY,4),C(MBODY,4) ! each 1.6GB
    COMMON /A_COM/ A
    !DIR$ PSECT /A_COM/ ALIGN=OCTA,WRT
    COMMON /B_COM/ B
    !DIR$ PSECT /B_COM/ ALIGN=OCTA,WRT
    COMMON /C_COM/ C
    !DIR$ PSECT /C_COM/ ALIGN=OCTA,WRT

Jim Dempsey

yoon__Young-Jin · ‎01-20-2019

I realized that I have been running the program with increased stack size and virtual memory (ulimit -s unlimited
ulimit -v unlimited). If I run the code with default stack size for linux (8192 kbyes), segmentation fault kicks in when the openmp flag is activated. Setting the stack size to unlimited (ulimit -s unlimited) seems to make the code running but produces incorrect results that I have been talking about in this thread.

-mcmodel=large doesn't make any difference. But the provided code does work well without any errors and the result is correct.