- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, guys. I'm a new hand of openmp. I encounter some problems related to deallocating allocatable array in openmp loop.
The main reason is the array is too large. An example code is following:
program omptest
use omp_lib
implicit none
INTEGER :: I,k,J
integer,pointer :: b(:)
!$OMP PARALLEL DO PRIVATE(i,b,k)
DO I=1,2490000
allocate(b(67214259))
k=OMP_get_thread_num()
deallocate(b)
END DO
!$OMP END PARALLEL DO
END PROGRAM
When the maximum bound of i is large, like 2490000, the program will break at deallocate(b)
I have also tried this version:(to make b as a threadprivate variable) but it is still useless
module da
implicit none
integer,allocatable,save :: b(:)
!$omp threadprivate(a,b)
contains
subroutine ji(i)
integer :: i
allocate(b(672142590))
end subroutine
end module
program omptest
use omp_lib
use da
implicit none
INTEGER :: I,k,J
!$OMP PARALLEL DO PRIVATE(i,k)
DO I=1,2490000
call ji(i)
k=OMP_get_thread_num()
deallocate(b)
END DO
!$OMP END PARALLEL DO
I guess it might because of the size of heap. I have tried to divide I into 10 parts, and makes parallel computing separately( serially) but it is still no use.
Could any body help me to find out the solution? Thank you very much!
My development environment is VS2012 update 4+Intel cluster studio xe 2013 sp1
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My original code is below:
The program will crash when the DEALLOCATE(ITSC,IPTSC,NTSC) in subroutine TRANSIENT_SUBCELL_DEL is run.
But what is mysterious is that the crash happens in some conditions.
Otherwise, the program run normally in serial mode.
OpenMP loop
SUBROUTINE COLLISION
!
!--calculate the collisions
!
USE MOLECS
USE CELLINFO
USE GAS
USE TRANSC
USE SAMPLES
USE CALC
IMPLICIT NONE
INTEGER :: I
INTEGER :: J,K,NSEL,M1,M2,MPOS,MT
REAL*8 :: DTC,ASEL,SEPSMIN,SEPS,RANF
INTEGER :: ISM
!--I,J,K,L,M,N---loop number
!--II,JJ,KK------loop control
!---ISM----------Select method---0 for transient sub cell 1 for virtual sub cell
!---M1-----------selected molecule 1
!---M2-----------selected molecule 2
!---MT-----------temporal selected molecule
!---MPOS---------selected molecule position in ICREF
!---SEPS---------square of current collision pair separation
!---SEPSMIN------square of current minimum molecule separation
!IPCP=0 this is done in molecules move
ALLOCATE(IPCP(NM))
IPCP=0
!$OMP PARALLEL DEFAULT(SHARED)
!$OMP DO SCHEDULE(STATIC) PRIVATE(I,J,K,DTC,ISM,ASEL,NSEL,MPOS,M1,M2,SEPS,SEPSMIN,MT,RANF) REDUCTION(+:TOTCOL,TCOL,COLLS,CLSEP)
DO I=1,NCCELLS
IF (FTIME-CCELL(5,I) > CCELL(3,I)) THEN
DTC=2.0D00*CCELL(3,I)
IF(ICCELL(2,I) .GT. 1) CCELL(5,I)=CCELL(5,I)+DTC
IF( ICCELL(2,I) .GT. 30)THEN
ISM=0!transient sub cell
ELSE
ISM=1!virtual sub cell
END IF
!NTC method
ASEL=0.5D00*ICCELL(2,I)*(ICCELL(2,I)-1)*&
&FNUM*CCELL(4,I)*DTC/CCELL(1,I)+CCELL(2,I)
IF(ASEL .LT. 0) ASEL=0.0D00
NSEL=ASEL
CCELL(2,I)=ASEL-NSEL
IF( (NSEL .GT. 0) .AND. (ICCELL(2,I) .GT. 1))THEN
IF(ISM .EQ. 0)THEN
CALL TRANSIENT_SUBCELL_GEN(I)
DO J=1,NSEL
!select first molecule
CALL RANDOM_NUMBER(RANF)
MPOS=INT(RANF*DFLOAT(ICCELL(2,I))-0.0001)+ICCELL(1,I)+1
M1=ICREF(MPOS)
!Select second molecule
CALL TRANSIENT_SUBCELL_SELECT(M1,M2)
!Collision
CALL MOLECULES_COL(M1,M2)
END DO
CALL TRANSIENT_SUBCELL_DEL
ELSE
DO J=1,NSEL
!select first molecule
CALL RANDOM_NUMBER(RANF)
MPOS=INT(RANF*DFLOAT(ICCELL(2,I))-0.0001)+ICCELL(1,I)+1
M1=ICREF(MPOS)
!select second molecule
SEPSMIN=(MAXCELL(2)*0.5D00)**2.0D00
DO K=1,ICCELL(2,I)
MPOS=ICCELL(1,I)+K
MT=ICREF(MPOS)
IF((MT .NE. M1) .AND. (MT .NE. IPCP(M1)))THEN
SEPS=(PX(MT)-PX(M1))**2.0D00+(PY(MT)-PY(M1))**2.0D00
IF( SEPS .LT. SEPSMIN)THEN
M2=MT
SEPSMIN=SEPS
END IF
END IF
END DO
!collision
CALL MOLECULES_COL(M1,M2)
END DO
END IF
END IF !end nsel possible
END IF
END DO
!$OMP END DO
!$OMP END PARALLEL
DEALLOCATE(IPCP)
END SUBROUTINE
THE BREAK POINT
MODULE TRANSC
IMPLICIT NONE
!only used in subroutine collision and TRANSIENT_SUBCELL for transient cell
INTEGER,SAVE :: NITSC,NDIVX,NDIVY
INTEGER,DIMENSION(:,:,:),ALLOCATABLE,SAVE:: ITSC
INTEGER,DIMENSION(:,:),ALLOCATABLE,SAVE :: IPTSC,NTSC
!$OMP THREADPRIVATE(NITSC,ITSC,IPTSC,NTSC,NDIVX,NDIVY)
!---NTSC(Y,X)----------number of molecule in (X,Y) transient sub cell
!---ITSC(I,Y,X)---- transient sub cell index
!--------X----the index in x direction
!--------Y----the index in y direction
!--------I----the code of molecule in transient sub cell
!-------------I=1,NITSC
!---IPTSC(I,M)-----molecule to transient cell
!-------I=1----x index of molecule M
!-------I=2----y index of molecule M
!-------I=3----index of molecule in transient cell
!---NDIVX--------number of transient sub cell in x direction
!---NDIVY--------number of transient sub cell in y direction
CONTAINS
SUBROUTINE TRANSIENT_SUBCELL_GEN(I)
USE NODEINFO
USE CELLINFO
USE MOLECS
IMPLICIT NONE
INTEGER :: I
INTEGER :: II,K,L,J,XC,YC
INTEGER,DIMENSION(:,:,:),ALLOCATABLE :: ITSC_BACKUP
REAL*8 :: x1,x2,x3,y1,y2,y3
REAL*8 :: x10,x20,x30,y10,y20,y30
REAL*8 :: XI,YI,X,Y,DX,DY
!---I---------------code of collision cell
!---II--------------code of sampling cell
!---NDIVX-----------number of transient sub cell in x direction
!---NDIVY-----------number of transient sub cell in y direction
!---ITSC_BACKUP-----BACKUP OF ITSC,in case the I of ITSC is not enough
!---XI,YI-----------the minimum boundary of transient sub cell
!---X ,Y -----------the maximum boundary of transient sub cell
II=ICCELL(3,I)
!---x1,y1,x2,y2,x3,y3-----the coordinate of the cell,clockwise
x10=NODES(1,ICELL(1,II))
x20=NODES(1,ICELL(2,II))
x30=NODES(1,ICELL(3,II))
y10=NODES(2,ICELL(1,II))
y20=NODES(2,ICELL(2,II))
y30=NODES(2,ICELL(3,II))
!---x1,y1,x2,y2,x3,y3-----the coordinate of the middle collision cell
x1=(x20+x30)*0.5D00
x2=(x10+x30)*0.5D00
x3=(x10+x20)*0.5D00
y1=(y20+y30)*0.5D00
y2=(y10+y30)*0.5D00
y3=(y10+y20)*0.5D00
SELECT CASE(I-ICELL(8,II))
CASE(1)
x1=x10; y1=y10
CASE(2)
x2=x20; y2=y20
CASE(3)
x3=x30; y3=y30
END SELECT
XI=DMIN1(x1,x2,x3); YI=DMIN1(y1,y2,y3)
X=DMAX1(x1,x2,x3) ; Y=DMAX1(y1,y2,y3)
! for triangular cell, num_molecule<0.5*ndivx*ndivy
! this is used to ensure there is nearly a molecule in every transient sub cell
! the following aims to make DX=DY
DX=DSQRT((X-XI)*(Y-YI)/(2.0D00*DFLOAT(ICCELL(2,I))))
DY=DX
NDIVX=(X-XI)/DX
DX=(X-XI)/DFLOAT(NDIVX)
NDIVY=(Y-YI)/DY
DY=(Y-YI)/DFLOAT(NDIVY)
NITSC=2
ALLOCATE(ITSC(NITSC,NDIVY,NDIVX),NTSC(NDIVY,NDIVX),IPTSC(3,NM))
ITSC=0; NTSC=0
L=ICCELL(1,I)
DO J=1+L,ICCELL(2,I)+L
K=ICREF(J) !K: code of molecule
XC=INT((PX(K)-XI-DX*1D-3)/DX)+1 !-DX*1D-3 aims to obviate PX(K)=X
YC=INT((PY(K)-YI-DY*1D-3)/DY)+1
IPTSC(1,K)=XC
IPTSC(2,K)=YC
NTSC(YC,XC)=NTSC(YC,XC)+1
IF(NTSC(YC,XC) .GT. NITSC)THEN
!increase the index code of molecule in the transient sub cell
ALLOCATE(ITSC_BACKUP(NITSC,NDIVY,NDIVX))
ITSC_BACKUP=ITSC
DEALLOCATE(ITSC)
NITSC=NITSC+1
ALLOCATE(ITSC(NITSC,NDIVY,NDIVX))
ITSC(NITSC:NITSC,:,:)=0
ITSC(1:NITSC-1,:,:)=ITSC_BACKUP
DEALLOCATE(ITSC_BACKUP)
END IF
ITSC(NTSC(YC,XC),YC,XC)=K
IPTSC(3,K)=NTSC(YC,XC)
END DO
RETURN
END SUBROUTINE
SUBROUTINE TRANSIENT_SUBCELL_SELECT(M1,M2)
USE MOLECS
USE CELLINFO
IMPLICIT NONE
INTEGER :: M1,M2
!----M1-----already selected molecule for collsion
!----M2-----the molecule to be selected
!----DEL1---the first deleted molecule
!----DEL2---the second deleted molecule
INTEGER :: I,J,K,L,JJ,JEND,X,Y,TSCLINK_LEN
INTEGER :: M1TX,M1TY,MPTX,MPTY
INTEGER,ALLOCATABLE :: TSCLINK(:)
REAL*8 :: RANF
INTEGER :: MP,M1SC,MPSC
!----I,J,K------------loop number
!----JJ,JEND----loop control
!----M1TX,M1TY--------molecule 1 transient sub cell X,Y coordinate
!----M2T--------------temporal molecule 2
!----X,Y--------------current
!----TSCLINK----------the link of transient sub cells which has molecule
!----TSCLINK_LEN------link length
!-------------------------------------------------------------------------
!----NTSC_BACKUP,ITSC_BACKUP is used to back up NTSC,ITSC
!----NUM_DEL---------if M1 and IPCP(M1) in the same transient sub cell =1 else =2
!----Delete M1 and IPCP(M1) from transient sub cell grid
M1TX=IPTSC(1,M1)
M1TY=IPTSC(2,M1)
M1SC=IPTSC(3,M1)
MP=IPCP(M1)
MPTX=0;MPTY=0;MPSC=0
IF(MP .NE. 0)THEN
MPTX=IPTSC(1,MP)
MPTY=IPTSC(2,MP)
MPSC=IPTSC(3,MP)
END IF
!-----start to select molecule 2
M2=0
JJ=0 !layer of transient sub cell molecules from which molecule 2 is chosen
DO WHILE ( M2 .EQ. 0)
JEND=8*JJ
IF(JJ .EQ. 0) JEND=1
ALLOCATE(TSCLINK(JEND))
TSCLINK=0
TSCLINK_LEN=0
!----loop between the transient sub cells in the same layer in order to create TSCLINK
IF( JJ .EQ. 0)THEN
IF(ITSC(1,M1TY,M1TX) .NE. 0)THEN
TSCLINK_LEN=1
TSCLINK(1)=1
END IF
ELSE
DO J=1,JEND
!------find X,Y index for transient sub cell
IF( J .LE. JJ*2+1)THEN
X=M1TX-JJ+J-1
Y=M1TY-JJ
ELSE IF( J .LE. JJ*4+1)THEN
X=M1TX+JJ
Y=M1TY-3*JJ+J-1
ELSE IF( J .LE. JJ*6+1) THEN
X=M1TX+5*JJ-J+1
Y=M1TY+JJ
ELSE
X=M1TX-JJ
Y=M1TY+7*JJ-J+1
END IF
!-------------------------
IF( (Y>= 1) .AND. (X>=1) .AND. (Y<=NDIVY).AND. (X<=NDIVX))THEN
IF(ITSC(1,Y,X) .NE. 0)THEN
TSCLINK_LEN=TSCLINK_LEN+1
TSCLINK(TSCLINK_LEN)=J
END IF
END IF
END DO
END IF
!--------select second molecule
IF( TSCLINK_LEN .EQ. 0)THEN
JJ=JJ+1 !no molecule in this layer, go to next layer
DEALLOCATE(TSCLINK)
ELSE
CALL RANDOM_NUMBER(RANF)
L=INT(TSCLINK_LEN*RANF-0.0001)+1 !code in link
800 J=TSCLINK(L) !code of transient sub cell
IF( J .LE. JJ*2+1)THEN
X=M1TX-JJ+J-1
Y=M1TY-JJ
ELSE IF( J .LE. JJ*4+1)THEN
X=M1TX+JJ
Y=M1TY-3*JJ+J-1
ELSE IF( J .LE. JJ*6+1) THEN
X=M1TX+5*JJ-J+1
Y=M1TY+JJ
ELSE
X=M1TX-JJ
Y=M1TY+7*JJ-J+1
END IF
IF( JJ .EQ. 0)THEN
IF( NTSC(Y,X) .EQ. 2)THEN
IF(MPTX .NE. M1TX)THEN
M2=ITSC(3-M1SC,M1TY,M1TX)
ELSE
JJ=JJ+1 !no molecule in this layer, go to next layer
DEALLOCATE(TSCLINK)
END IF
ELSE IF( NTSC(Y,X) .GT. 2)THEN
DO WHILE(M2 .EQ. 0)
CALL RANDOM_NUMBER(RANF)
K=INT(NTSC(Y,X)*RANF-0.0001)+1 !code in transient sub cell
M2=ITSC(K,Y,X) !code of molecule
IF((M2 .EQ. M1) .OR. (M2 .EQ. MP))THEN
M2=0
END IF
END DO
ELSE
JJ=JJ+1 !no molecule in this layer, go to next layer
DEALLOCATE(TSCLINK)
END IF
ELSE
K=INT(NTSC(Y,X)*RANF-0.0001)+1 !code in transient sub cell
M2=ITSC(K,Y,X) !code of molecule
IF( M2 .EQ. MP)THEN
M2=0
IF(TSCLINK_LEN .NE. 1)THEN
L=L+1
IF(L .GT. TSCLINK_LEN)THEN
L=L-2
END IF
GOTO 800
END IF
JJ=JJ+1 !no molecule in this layer, go to next layer
DEALLOCATE(TSCLINK)
END IF
END IF
END IF
END DO
RETURN
END SUBROUTINE
SUBROUTINE TRANSIENT_SUBCELL_DEL
IMPLICIT NONE
DEALLOCATE(ITSC,IPTSC,NTSC)
NITSC=0
RETURN
END SUBROUTINE
END MODULE
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What happens with
SUBROUTINE TRANSIENT_SUBCELL_DEL IMPLICIT NONE if(allocated(ITSC)) then dealocate(ITSC) else print *,"ITSC not allocated" endif if(allocated(IPTSC)) then dealocate(IPTSC) else print *,"IPTSC not allocated" endif if(allocated(NTSC)) then dealocate(NTSC) else print *,"NTSC not allocated" endif NITSC=0 RETURN END SUBROUTINE
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In response to your private message.
Try to reduce the number of allocate/deallocate
!$omp parallel ... ... if(allocated(YourThreadPrivateArray)) then if(size(YourThreadPrivateArray) .lt. SizeYouWant) then deallocate(YourThreadPrivateArray) allocate(YourThreadPrivateArray(SizeYouWant), STAT=ISTAT) if(ISTAT != 0) stop ! handle error here endif else allocate(YourThreadPrivateArray(SizeYouWant), STAT=ISTAT) if(ISTAT != 0) stop ! handle error here endif ... !$omp end parallel ...
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I removed my earlier reply about the stack overflow - that's not what's happening...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the problem is stack overflow (good possibility) then the error should not have occurred on the DEALLOCATE, rather it would be expected on a statement that creates a temporary array (when not needed) such as on
ITSC(1:NITSC-1,:,:)=ITSC_BACKUP
Since the DEALLOCATE immediately follows the above statement, it could vary well be a mis-generated line number in the error report.
Note, for a long time IVF has been noted for creating array temporaries when not necessary.
*** in the case above case, the temporary is likely created because the first index was extended, and your copy operation is performed in chunks. In the above case (at least until array temporary is eliminated) try:
! ITSC(1:NITSC-1,:,:)=ITSC_BACKUP DO I=1,SIZE(ITSC, DIM=2) DO J=1,SIZE(ITSC, DIM=3) ITSC(1:NITSC-1,I,J)=ITSC_BACKUP(1:NITSC-1,I,J) END DO END DO
(Change I and J accordingly)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error really is happening in the DEALLOCATE. It's an access violation in the library routine. I'm asking the developers to check it out.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What's really maddening is that the error comes and goes randomly. A common problem with parallel applications...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
What happens with
SUBROUTINE TRANSIENT_SUBCELL_DEL IMPLICIT NONE if(allocated(ITSC)) then dealocate(ITSC) else print *,"ITSC not allocated" endif if(allocated(IPTSC)) then dealocate(IPTSC) else print *,"IPTSC not allocated" endif if(allocated(NTSC)) then dealocate(NTSC) else print *,"NTSC not allocated" endif NITSC=0 RETURN END SUBROUTINEJim Dempsey
Dear Jim, I have tried this earlier, but actually, the program crash down without any warning or information. I have even add some codes to give the array a value, there is no error happening.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
What's really maddening is that the error comes and goes randomly. A common problem with parallel applications...
Dear Steve, I'm happy for your help since I find you are familiar with OpenMP and Fortran. When the program is compiled in release mode with out debug information. It just crashed with out any information. When compiled with debug information and use VS2012 to debug, the program will break at deallocate.
By the way, this problem does not always happen. When the maximum number of index is small, it runs pretty well. I have also tried to increase the reserved heap size in the project property, it seems it could run longer ( I guess it but don't really counts)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I find that the error happens about half the time when I run the program. This complicates trying to find a workaround. I have asked the developers to investigate - issue ID is DPD200257093.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would it be worth trying to remove the allocate out of the !$OMP region? An approach could be something like below, where the array for each thread is allocated outside the loop and you use the thread number to identify the array section you use:
[fortran]
program omptest
use omp_lib
implicit none
INTEGER :: I,k,J, n
integer,pointer :: b(:,:)
n = OMP_GET_MAX_THREADS() - 1
allocate (b (67214259,0:n) )
!$OMP PARALLEL DO PRIVATE (i,k), SHARED (b)
DO I=1,2490000
!
k=OMP_get_thread_num()
call use_thread_b ( b(:,k) )
!
END DO
!$OMP END PARALLEL DO
END PROGRAM[/fortran]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our investigation so far has uncovered a missing synchronization when deallocating large allocations that end up using VirtualAlloc. But fixing that revealed more issues, so investigation continues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We believe that we've fixed the issues this test program uncovered, both on Windows and Linux, though they were different. Thanks for alerting us to the problem. I expect that the fix will appear in the 15.0 release later this year. On Windows the problem is seen if you do a lot of large allocations and deallocations in a threaded environment.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page