- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
(I submitted this to premier support already, but just in case somebody else has run into this I thought I'd add a thread)
Briefly, I have a subroutine with hand-coded openmp directives and local routines in a CONTAINS environment. A very strange thing happens to at least two passed-in array declarations when the openmp directives (even just one of them) are activated with -openmp.
SUBROUTINE ADVECT(s,s0,fs,u,v,w, &
gxt,gyt,gzt,rrp,rrm,dt, &
nx,ny,nz, &
svar,atype,izero0)
USE GRID_MODULE
USE CPUTIME_MODULE
USE PARAM_MODULE
implicit none
integer, INTENT(IN) :: nx, ny, nz, atype, izero0
real, INTENT(INOUT) :: s (-ng+1:nx+ng,-ng+1:ny+ng,-ng+1:nz+ng)
real, INTENT(INOUT) :: s0(-ng+1:nx+ng,-ng+1:ny+ng,-ng+1:nz+ng)
real, INTENT(INOUT) :: fs(-ng+1:nx+ng,-ng+1:ny+ng,-ng+1:nz+ng)
real, INTENT(IN) :: u (-ng+1:nx+ng,-ng+1:ny+ng,-ng+1:nz+ng)
real, INTENT(IN) :: v (-ng+1:nx+ng,-ng+1:ny+ng,-ng+1:nz+ng)
real, INTENT(IN) :: w (-ng+1:nx+ng,-ng+1:ny+ng,-ng+1:nz+ng)
real, INTENT(IN) :: rrp(-ng+1:nz+ng), rrm(-ng+1:nz+ng)
real, INTENT(IN) :: dt
(ng is declared in a module)
At the beginning of the code I added this:
write(0,*) 'ADVECTRK: nx,ny,nz,ng = ',nx,ny,nz,ng
write(0,*) 'size of s: ',size(s,1),size(s,2),size(s,3)
write(0,*) 'size of s0: ',size(s0,1),size(s0,2),size(s0,3)
write(0,*) 'size of u: ',size(u,1),size(u,2),size(u,3)
write(0,*) 'size of rrp,gxt: ',size(rrp,1),size(gxt,1),size(gxt,2)
And without -openmp, the sizes are correct:
nx=41,ny=41,nz=41,ng=3
size(s,1)=47, size(s,2)=47, size(s,3)=47.
But with -openmp, somehow I get:
ADVECTRK: nx,ny,nz,ng = 41 41 41 3
size of s(1,2,3): 10 7 10
size of s0(1,2,3): 10 7 10
size of u(1,2,3): 10 7 10
size of rrp,gxt(1,2): 10 10 4
nx=41,ny=41,nz=41,ng=3 (all correct), but
size(s,1)=10, size(s,2)=7, size(s,3)=10 (all incorrect).
So even though nx,ny,nz, and ng are correct, somehow the size of S is wrong, and I get BAD_ACCESS errors as a result. (This appears to be happening to all of the 3D arrays that I have checked.) Also, this is running just one thread. Same problem for both ia32 and 64bit.
If I declare the u array arbitrarily, I can determine that constant values are being set (nx=4,ny=1, nz=4) in the declarations, regardless of the actual values of nx,ny, and nz (which are always printed out correctly.
This code runs fine under ifort 10.1.014 (I haven't run the latest version 10). There seems to be something "special" about this subroutine because none of the other subroutines in our code base exhibits this problem. I suspect it has to do with containing a number of internal subprograms. Fun, eh?
The compile options are
ifort -openmp -O0 -zero -g -CB -align all -ftz -I../src/include -I/opt/local/netcdf4m32/include -I/opt/local/hdf5m64/include -c ../src/advectrk.F90
(I'm not using the XCode environment -- everything is done in terminal
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Burgel
Something for you to try. On your array declarations try adding automatic
real, automatic,INTENT(IN) :: rrp(-ng+1:nz+ng), rrm(-ng+1:nz+ng)
Do this to all the declarations.
The purpose is to ensure that the array descriptors are located on the stack (as opposed to static storage).
Alternately you can use/Qauto.
I had experienced a similar problem where the default for the array descriptor used to be stack local but the version change (or my ineptitude) caused them to be placed in static storage. Although this won't matter for a single threaded application, it does matter for multi-threaded applications.
Jim Dempsey
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
-openmp implies -auto.
The reply on premier support also suggested testing with -automatic instead of -openmp. I tried that, and the code still runs correctly with -automatic.
I have a much smaller test program now that exhibits the problem, and it seems to be problem with having subroutines within a CONTAINS. It doesn't matter if the openmp loop is in the contained subroutine or in the main subroutine. So there is some kind of bad interaction going on when the loop is parallelized.
I uploaded the code example to premier support, but I've also tried to post a similar one here if it helps (advectrktest.F90) (I don't see it showing up anywhere, though ... It is in a folder called 'ted')
compiled with "ifort -openmp -O0 -g -o x.test advectrktest.F90" on OS X (10.5.6) and ifort 11.0.056
-- Ted
