- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All,
I'm actually parallelizing a program with OpenMP 3.0 directives, there I found one problem with the following call of a subroutine:
[cpp] IF (MIXTURE_FRACTION) THEN !$OMP PARALLEL DO COLLAPSE(3) PRIVATE(K,J,I,ITMP,Z_VECTOR,CP_SUM,CP_MF,N) DO K=1,KBAR DO J=1,JBAR DO I=1,IBAR !IF (SOLID(CELL_INDEX(I,J,K))) CYCLE OpenMP_DIVG_005: IF (SOLID(CELL_INDEX(I,J,K))) THEN CALL DO_NOTHING('DIVG_PART1_MIXTURE_FRACTION_RTRM') ELSE !OpenMP ITMP = 0.1_EB*TMP(I,J,K) Z_VECTOR = YYP(I,J,K,I_Z_MIN:I_Z_MAX) CALL GET_CP(Z_VECTOR,Y_SUM(I,J,K),CP_MF,ITMP) !!! -> THIS Subroutine is called, than inside the subroutine the program stops with OpenMP, it runs without OpenMP IF (N_SPECIES > (I_Z_MAX-I_Z_MIN+1)) THEN CP_SUM = 0._EB DO N=1,N_SPECIES IF (SPECIES(N)%MODE/=MIXTURE_FRACTION_SPECIES) & CP_SUM = CP_SUM + YYP(I,J,K,N)*SPECIES(N)%CP(ITMP) END DO CP_MF = CP_SUM + (1._EB-Y_SUM(I,J,K))*CP_MF ENDIF RTRM(I,J,K) = R_PBAR(K,PRESSURE_ZONE(I,J,K))*RSUM(I,J,K)/CP_MF DP(I,J,K) = RTRM(I,J,K)*DP(I,J,K) ENDIF OpenMP_DIVG_005 ENDDO ENDDO ENDDO !!$OMP END PARALLEL DO [/cpp]
The GET_CP Subroutine is called and then the code stops at the following part of the code:
[cpp] SUBROUTINE GET_CP(Z_IN,YY_SUM,CP_MF,ITMP) INTEGER, INTENT(IN) :: ITMP REAL(EB), INTENT(IN) :: Z_IN(1:I_Z_MAX - I_Z_MIN + 1),YY_SUM REAL(EB) ::Z(1:I_Z_MAX - I_Z_MIN + 1),CP_MF,OMYYSUM IF (YY_SUM >=1._EB) THEN CP_MF = SPECIES(0)%CP(ITMP) RETURN ELSE OMYYSUM = 1._EB-MAX(0._EB,YY_SUM) Z = MAX(0._EB,MIN(1._EB,Z_IN))/OMYYSUM !----> THIS is the line where the code stops ENDIF CP_MF = (Z2CP_C(ITMP) + DOT_PRODUCT(Z2CP(ITMP,:),Z))/(Y_MF_SUM_C + DOT_PRODUCT(Y_MF_SUM_Z,Z)) END SUBROUTINE GET_CP [/cpp]
All variables are correctly initialized, there is no division by zero (OMYYSUM = 1), so all is correct. The program will be compiled, but then it hangs by running the line above. I have used other compilers (the Sun Fortran Compiler), there does no problem occure and the code works fine. Only the Intel Fortran V11 compiler produced an executable code, which fails on this line.
The error message I get by running the Intel-compiled code is:
[cpp]forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source . 4001C410 Unknown Unknown Unknown libiomp5.so 400BBE12 Unknown Unknown Unknown libpthread.so.0 400D64FB Unknown Unknown Unknown libc.so.6 401C3E5E Unknown Unknown Unknown[/cpp]
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I forgot one information: If I compile the code without the -openmp flag, then the code runs perfect, so it seems to be a problem with the OpenMP environment
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try adding AUTOMATIC to
REAL(EB)::Z(1:I_Z_MAX-I_Z_MIN+1),CP_MF,OMYYSUM
REAL(EB), AUTOMATIC::Z(1:I_Z_MAX-I_Z_MIN+1),CP_MF,OMYYSUM
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try adding AUTOMATIC to
REAL(EB)::Z(1:I_Z_MAX-I_Z_MIN+1),CP_MF,OMYYSUM
REAL(EB), AUTOMATIC::Z(1:I_Z_MAX-I_Z_MIN+1),CP_MF,OMYYSUM
Jim Dempsey
I have tried this, but it has no effect. The problem occurs at the same line in the code. Furthermore this could be no solution, because if the variables are correctly implemented in a serial version of the code, there should be no changes in the OpenMP code version. If I have to check all the variables, parallelization would be very difficult, becuase the code has more than 50.000 lines and more than 500 variables.
If it is helpful, I could submit the complete code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried this, but it has no effect. The problem occurs at the same line in the code. Furthermore this could be no solution, because if the variables are correctly implemented in a serial version of the code, there should be no changes in the OpenMP code version. If I have to check all the variables, parallelization would be very difficult, becuase the code has more than 50.000 lines and more than 500 variables.
If it is helpful, I could submit the complete code.
The -openmp option should make local arrays dynamic, even if they were not already automatic (in the standard Fortran sense) by definition, so the lack of change is to be expected. The problem usually cited with Fortran automatic arrays is the lack of error diagnosis. Supposing the allocation fails, which might happen simply because the stack size limit is reached at this point, you don't have a satisfactory way to catch it. Hence the usual recommendation for ALLOCATABLE arrays with STAT= checking. This would apply equally to the serial version, but the OpenMP version will consume more stack.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The -openmp option should make local arrays dynamic, even if they were not already automatic (in the standard Fortran sense) by definition, so the lack of change is to be expected. The problem usually cited with Fortran automatic arrays is the lack of error diagnosis. Supposing the allocation fails, which might happen simply because the stack size limit is reached at this point, you don't have a satisfactory way to catch it. Hence the usual recommendation for ALLOCATABLE arrays with STAT= checking. This would apply equally to the serial version, but the OpenMP version will consume more stack.
Now I debuuged every line of the code for this specific problem, and I found the problem and I think it is not a stacksize or programming problem, I think it's a compiler problem:
In the first subroutine, which I posted the error starts:
[cpp]# IF (MIXTURE_FRACTION) THEN # !$OMP PARALLEL DO COLLAPSE(3) PRIVATE(K,J,I,ITMP,Z_VECTOR,CP_SUM,CP_MF,N) # DO K=1,KBAR # DO J=1,JBAR # DO I=1,IBAR # !IF (SOLID(CELL_INDEX(I,J,K))) CYCLE # OpenMP_DIVG_005: IF (SOLID(CELL_INDEX(I,J,K))) THEN # CALL DO_NOTHING('DIVG_PART1_MIXTURE_FRACTION_RTRM') # ELSE !OpenMP # ITMP = 0.1_EB*TMP(I,J,K) # Z_VECTOR = YYP(I,J,K,I_Z_MIN:I_Z_MAX)[/cpp]
At the last line the error occurs. The Z_VECTOR variable is correctly allocated in another module, in my test case it has a size of 2, so Z_VECTOR(1:2) is allocated, not in the same module where this subroutine (code fragment above) is running. The variable YYP is also correctly allocated, the value of I_Z_MIN = 1 and the value of I_Z_MAX = 2, so YYP and Z_VECTOR have the same size, so they must match. But the problem is, that Z_VECTOR = YYP(...) is not done, and so Z_VECTOR has no values. I wrote out with WRITE(*,*) YYP(I,J,K,I_Z_MIN:I_Z_MAX) the values of YYP, and that is done perfectly, two values appear on the screen. After the line Z_VECTOR = YYP(...) I wrote out with WRITE(*,*) Z_VECTOR, and there the program hangs with the error in the first post. Stack-size is not the problem, I have 2GB of stack and max 100MB are used, so I think it is a compiler based problem. To ensure, that the size and values of Z_VECTOR and YYP(...) are correctly matching, I tried instead of Z_VECTOR = YYP(...) a new possibility with Z_VECTOR(I_Z_MIN:I_Z_MAX) = YYP(...,I_Z_MIN:I_Z_MAX) but this has also no success. If I compile without the -openmp flag, Z_VECTOR = YYP(...) or Z_VECTOR(I_Z_MIN:I_Z_MAX) = YYP(...) runs fine and no problem occurs. The values of I_Z_MIN and I_Z_MAX are calculated at the start of the program and they are not changed any more, so this could also be not the problem.
I tried the code with two other compilers and the OpenMP flags for this compilers, this is the XLF 12.1.0.0 compiler for AIX and the actual SUN Express compiler for Linux, and both compilers produce a perfectly running code. With the Intel Fortran 11 Compiler, the code fails, and I do not use any optimizations (compiling with -O0), so I think, this is not a programming problem, it's a compiler problem. My OS is Ubuntu 8.04, if this helps for debugging. If the code helpful, I can email it.
Thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim and FDSUser,
I have an application here that (at least in the eariler versions of IVF) consistently exhibited problems in OpenMP where I have in a subroutine
subroutine foo(x)
real(8) :: x
real(8) :: TOSVX1(3), TOSVX2(3), TOSVX3(3)
...
That these local arrays are created as if SAVE were on the declaration. That is, the code generates a static copy of the TOSVXn(3) arrays. When adding AUTOMATIC
real(8), automatic:: TOSVX1(3), TOSVX2(3), TOSVX3(3)
This forces the arrays to be allocate on the stack
When used without the automatic, the multiple threads overwrite each others data in the temporary arrays.
Whether it is a compiler bug or an options conflict, I could not ascertain, I do know that by including automatic that there is no ambeguity in my intentions as to if the arrays must be local.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>The Z_VECTOR variable is correctly allocated in another module,
Then Z_VECTOR cannot be private (unless it is also automatic)
Whats happening is your Z_VECTOR is being allocated in another module. i.e. The array descriptor for Z_VECTOR resides in the other module, and the allocation of Z_VECTOR data initializes the array descriptor of foo::Z_VECTOR. By making Z_VECTOR and OpenMP PRIVATE variable you are instantiating an uninitialized array descriptor of the name Z_VECTOR in the scope of the thread and within the context of the openmp parallel region. When compiling without OpenMP (and without the private) you will be using the modules copy of Z_VECTOR.
If each thread needs a seperate copy of Z_VECTOR and if you cannot stack allocate or you do not desire the overhead of allocation/deallocation as you enter and leave the parallel region then either placeZ_VECTOR into thread private data or generate the approprate Z_VECTOR reference for passing to the subroutine. e.g. use pointer to array of Z_VECTORs indexed by a thread unique sequence number. Note, when you use OpenMP nested parallelization you cannot use omp_thread_num() to obtain a unique thread number (generate your own and place it into thread private data).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>The Z_VECTOR variable is correctly allocated in another module,
Then Z_VECTOR cannot be private (unless it is also automatic)
Whats happening is your Z_VECTOR is being allocated in another module. i.e. The array descriptor for Z_VECTOR resides in the other module, and the allocation of Z_VECTOR data initializes the array descriptor of foo::Z_VECTOR. By making Z_VECTOR and OpenMP PRIVATE variable you are instantiating an uninitialized array descriptor of the name Z_VECTOR in the scope of the thread and within the context of the openmp parallel region. When compiling without OpenMP (and without the private) you will be using the modules copy of Z_VECTOR.
If each thread needs a seperate copy of Z_VECTOR and if you cannot stack allocate or you do not desire the overhead of allocation/deallocation as you enter and leave the parallel region then either placeZ_VECTOR into thread private data or generate the approprate Z_VECTOR reference for passing to the subroutine. e.g. use pointer to array of Z_VECTORs indexed by a thread unique sequence number. Note, when you use OpenMP nested parallelization you cannot use omp_thread_num() to obtain a unique thread number (generate your own and place it into thread private data).
Jim Dempsey
Jim,
after some changes today that's what I found out. I copied the values of YYP directly to the subroutine GET_CP without using the Z_VECTOR and it works. Allocation of Z_VECTOR in the same MODULE where the DO-Loops are helps also. But if I read the OpenMP 3.0 specs, I find no definition of this problem. I found only example A.30.2.f in the specs, but I think this is different from my construct, because my Z_VECTOR is only allocated one time and not two times like in this example. I will ask this question at the OpenMP forum, maybe it can be answered how it has to work and to be implemented based on the specification.
Thanks for your help!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page