- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, guys. I'm a new hand of openmp. I encounter some problems related to deallocating allocatable array in openmp loop.
The main reason is the array is too large. An example code is following:
program omptest use omp_lib implicit none INTEGER :: I,k,J integer,pointer :: b(:) !$OMP PARALLEL DO PRIVATE(i,b,k) DO I=1,2490000 allocate(b(67214259)) k=OMP_get_thread_num() deallocate(b) END DO !$OMP END PARALLEL DO END PROGRAM
When the maximum bound of i is large, like 2490000, the program will break at deallocate(b)
I have also tried this version:(to make b as a threadprivate variable) but it is still useless
module da implicit none integer,allocatable,save :: b(:) !$omp threadprivate(a,b) contains subroutine ji(i) integer :: i allocate(b(672142590)) end subroutine end module program omptest use omp_lib use da implicit none INTEGER :: I,k,J !$OMP PARALLEL DO PRIVATE(i,k) DO I=1,2490000 call ji(i) k=OMP_get_thread_num() deallocate(b) END DO !$OMP END PARALLEL DO
I guess it might because of the size of heap. I have tried to divide I into 10 parts, and makes parallel computing separately( serially) but it is still no use.
Could any body help me to find out the solution? Thank you very much!
My development environment is VS2012 update 4+Intel cluster studio xe 2013 sp1
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My original code is below:
The program will crash when the DEALLOCATE(ITSC,IPTSC,NTSC) in subroutine TRANSIENT_SUBCELL_DEL is run.
But what is mysterious is that the crash happens in some conditions.
Otherwise, the program run normally in serial mode.
OpenMP loop
SUBROUTINE COLLISION ! !--calculate the collisions ! USE MOLECS USE CELLINFO USE GAS USE TRANSC USE SAMPLES USE CALC IMPLICIT NONE INTEGER :: I INTEGER :: J,K,NSEL,M1,M2,MPOS,MT REAL*8 :: DTC,ASEL,SEPSMIN,SEPS,RANF INTEGER :: ISM !--I,J,K,L,M,N---loop number !--II,JJ,KK------loop control !---ISM----------Select method---0 for transient sub cell 1 for virtual sub cell !---M1-----------selected molecule 1 !---M2-----------selected molecule 2 !---MT-----------temporal selected molecule !---MPOS---------selected molecule position in ICREF !---SEPS---------square of current collision pair separation !---SEPSMIN------square of current minimum molecule separation !IPCP=0 this is done in molecules move ALLOCATE(IPCP(NM)) IPCP=0 !$OMP PARALLEL DEFAULT(SHARED) !$OMP DO SCHEDULE(STATIC) PRIVATE(I,J,K,DTC,ISM,ASEL,NSEL,MPOS,M1,M2,SEPS,SEPSMIN,MT,RANF) REDUCTION(+:TOTCOL,TCOL,COLLS,CLSEP) DO I=1,NCCELLS IF (FTIME-CCELL(5,I) > CCELL(3,I)) THEN DTC=2.0D00*CCELL(3,I) IF(ICCELL(2,I) .GT. 1) CCELL(5,I)=CCELL(5,I)+DTC IF( ICCELL(2,I) .GT. 30)THEN ISM=0!transient sub cell ELSE ISM=1!virtual sub cell END IF !NTC method ASEL=0.5D00*ICCELL(2,I)*(ICCELL(2,I)-1)*& &FNUM*CCELL(4,I)*DTC/CCELL(1,I)+CCELL(2,I) IF(ASEL .LT. 0) ASEL=0.0D00 NSEL=ASEL CCELL(2,I)=ASEL-NSEL IF( (NSEL .GT. 0) .AND. (ICCELL(2,I) .GT. 1))THEN IF(ISM .EQ. 0)THEN CALL TRANSIENT_SUBCELL_GEN(I) DO J=1,NSEL !select first molecule CALL RANDOM_NUMBER(RANF) MPOS=INT(RANF*DFLOAT(ICCELL(2,I))-0.0001)+ICCELL(1,I)+1 M1=ICREF(MPOS) !Select second molecule CALL TRANSIENT_SUBCELL_SELECT(M1,M2) !Collision CALL MOLECULES_COL(M1,M2) END DO CALL TRANSIENT_SUBCELL_DEL ELSE DO J=1,NSEL !select first molecule CALL RANDOM_NUMBER(RANF) MPOS=INT(RANF*DFLOAT(ICCELL(2,I))-0.0001)+ICCELL(1,I)+1 M1=ICREF(MPOS) !select second molecule SEPSMIN=(MAXCELL(2)*0.5D00)**2.0D00 DO K=1,ICCELL(2,I) MPOS=ICCELL(1,I)+K MT=ICREF(MPOS) IF((MT .NE. M1) .AND. (MT .NE. IPCP(M1)))THEN SEPS=(PX(MT)-PX(M1))**2.0D00+(PY(MT)-PY(M1))**2.0D00 IF( SEPS .LT. SEPSMIN)THEN M2=MT SEPSMIN=SEPS END IF END IF END DO !collision CALL MOLECULES_COL(M1,M2) END DO END IF END IF !end nsel possible END IF END DO !$OMP END DO !$OMP END PARALLEL DEALLOCATE(IPCP) END SUBROUTINE
THE BREAK POINT
MODULE TRANSC IMPLICIT NONE !only used in subroutine collision and TRANSIENT_SUBCELL for transient cell INTEGER,SAVE :: NITSC,NDIVX,NDIVY INTEGER,DIMENSION(:,:,:),ALLOCATABLE,SAVE:: ITSC INTEGER,DIMENSION(:,:),ALLOCATABLE,SAVE :: IPTSC,NTSC !$OMP THREADPRIVATE(NITSC,ITSC,IPTSC,NTSC,NDIVX,NDIVY) !---NTSC(Y,X)----------number of molecule in (X,Y) transient sub cell !---ITSC(I,Y,X)---- transient sub cell index !--------X----the index in x direction !--------Y----the index in y direction !--------I----the code of molecule in transient sub cell !-------------I=1,NITSC !---IPTSC(I,M)-----molecule to transient cell !-------I=1----x index of molecule M !-------I=2----y index of molecule M !-------I=3----index of molecule in transient cell !---NDIVX--------number of transient sub cell in x direction !---NDIVY--------number of transient sub cell in y direction CONTAINS SUBROUTINE TRANSIENT_SUBCELL_GEN(I) USE NODEINFO USE CELLINFO USE MOLECS IMPLICIT NONE INTEGER :: I INTEGER :: II,K,L,J,XC,YC INTEGER,DIMENSION(:,:,:),ALLOCATABLE :: ITSC_BACKUP REAL*8 :: x1,x2,x3,y1,y2,y3 REAL*8 :: x10,x20,x30,y10,y20,y30 REAL*8 :: XI,YI,X,Y,DX,DY !---I---------------code of collision cell !---II--------------code of sampling cell !---NDIVX-----------number of transient sub cell in x direction !---NDIVY-----------number of transient sub cell in y direction !---ITSC_BACKUP-----BACKUP OF ITSC,in case the I of ITSC is not enough !---XI,YI-----------the minimum boundary of transient sub cell !---X ,Y -----------the maximum boundary of transient sub cell II=ICCELL(3,I) !---x1,y1,x2,y2,x3,y3-----the coordinate of the cell,clockwise x10=NODES(1,ICELL(1,II)) x20=NODES(1,ICELL(2,II)) x30=NODES(1,ICELL(3,II)) y10=NODES(2,ICELL(1,II)) y20=NODES(2,ICELL(2,II)) y30=NODES(2,ICELL(3,II)) !---x1,y1,x2,y2,x3,y3-----the coordinate of the middle collision cell x1=(x20+x30)*0.5D00 x2=(x10+x30)*0.5D00 x3=(x10+x20)*0.5D00 y1=(y20+y30)*0.5D00 y2=(y10+y30)*0.5D00 y3=(y10+y20)*0.5D00 SELECT CASE(I-ICELL(8,II)) CASE(1) x1=x10; y1=y10 CASE(2) x2=x20; y2=y20 CASE(3) x3=x30; y3=y30 END SELECT XI=DMIN1(x1,x2,x3); YI=DMIN1(y1,y2,y3) X=DMAX1(x1,x2,x3) ; Y=DMAX1(y1,y2,y3) ! for triangular cell, num_molecule<0.5*ndivx*ndivy ! this is used to ensure there is nearly a molecule in every transient sub cell ! the following aims to make DX=DY DX=DSQRT((X-XI)*(Y-YI)/(2.0D00*DFLOAT(ICCELL(2,I)))) DY=DX NDIVX=(X-XI)/DX DX=(X-XI)/DFLOAT(NDIVX) NDIVY=(Y-YI)/DY DY=(Y-YI)/DFLOAT(NDIVY) NITSC=2 ALLOCATE(ITSC(NITSC,NDIVY,NDIVX),NTSC(NDIVY,NDIVX),IPTSC(3,NM)) ITSC=0; NTSC=0 L=ICCELL(1,I) DO J=1+L,ICCELL(2,I)+L K=ICREF(J) !K: code of molecule XC=INT((PX(K)-XI-DX*1D-3)/DX)+1 !-DX*1D-3 aims to obviate PX(K)=X YC=INT((PY(K)-YI-DY*1D-3)/DY)+1 IPTSC(1,K)=XC IPTSC(2,K)=YC NTSC(YC,XC)=NTSC(YC,XC)+1 IF(NTSC(YC,XC) .GT. NITSC)THEN !increase the index code of molecule in the transient sub cell ALLOCATE(ITSC_BACKUP(NITSC,NDIVY,NDIVX)) ITSC_BACKUP=ITSC DEALLOCATE(ITSC) NITSC=NITSC+1 ALLOCATE(ITSC(NITSC,NDIVY,NDIVX)) ITSC(NITSC:NITSC,:,:)=0 ITSC(1:NITSC-1,:,:)=ITSC_BACKUP DEALLOCATE(ITSC_BACKUP) END IF ITSC(NTSC(YC,XC),YC,XC)=K IPTSC(3,K)=NTSC(YC,XC) END DO RETURN END SUBROUTINE SUBROUTINE TRANSIENT_SUBCELL_SELECT(M1,M2) USE MOLECS USE CELLINFO IMPLICIT NONE INTEGER :: M1,M2 !----M1-----already selected molecule for collsion !----M2-----the molecule to be selected !----DEL1---the first deleted molecule !----DEL2---the second deleted molecule INTEGER :: I,J,K,L,JJ,JEND,X,Y,TSCLINK_LEN INTEGER :: M1TX,M1TY,MPTX,MPTY INTEGER,ALLOCATABLE :: TSCLINK(:) REAL*8 :: RANF INTEGER :: MP,M1SC,MPSC !----I,J,K------------loop number !----JJ,JEND----loop control !----M1TX,M1TY--------molecule 1 transient sub cell X,Y coordinate !----M2T--------------temporal molecule 2 !----X,Y--------------current !----TSCLINK----------the link of transient sub cells which has molecule !----TSCLINK_LEN------link length !------------------------------------------------------------------------- !----NTSC_BACKUP,ITSC_BACKUP is used to back up NTSC,ITSC !----NUM_DEL---------if M1 and IPCP(M1) in the same transient sub cell =1 else =2 !----Delete M1 and IPCP(M1) from transient sub cell grid M1TX=IPTSC(1,M1) M1TY=IPTSC(2,M1) M1SC=IPTSC(3,M1) MP=IPCP(M1) MPTX=0;MPTY=0;MPSC=0 IF(MP .NE. 0)THEN MPTX=IPTSC(1,MP) MPTY=IPTSC(2,MP) MPSC=IPTSC(3,MP) END IF !-----start to select molecule 2 M2=0 JJ=0 !layer of transient sub cell molecules from which molecule 2 is chosen DO WHILE ( M2 .EQ. 0) JEND=8*JJ IF(JJ .EQ. 0) JEND=1 ALLOCATE(TSCLINK(JEND)) TSCLINK=0 TSCLINK_LEN=0 !----loop between the transient sub cells in the same layer in order to create TSCLINK IF( JJ .EQ. 0)THEN IF(ITSC(1,M1TY,M1TX) .NE. 0)THEN TSCLINK_LEN=1 TSCLINK(1)=1 END IF ELSE DO J=1,JEND !------find X,Y index for transient sub cell IF( J .LE. JJ*2+1)THEN X=M1TX-JJ+J-1 Y=M1TY-JJ ELSE IF( J .LE. JJ*4+1)THEN X=M1TX+JJ Y=M1TY-3*JJ+J-1 ELSE IF( J .LE. JJ*6+1) THEN X=M1TX+5*JJ-J+1 Y=M1TY+JJ ELSE X=M1TX-JJ Y=M1TY+7*JJ-J+1 END IF !------------------------- IF( (Y>= 1) .AND. (X>=1) .AND. (Y<=NDIVY).AND. (X<=NDIVX))THEN IF(ITSC(1,Y,X) .NE. 0)THEN TSCLINK_LEN=TSCLINK_LEN+1 TSCLINK(TSCLINK_LEN)=J END IF END IF END DO END IF !--------select second molecule IF( TSCLINK_LEN .EQ. 0)THEN JJ=JJ+1 !no molecule in this layer, go to next layer DEALLOCATE(TSCLINK) ELSE CALL RANDOM_NUMBER(RANF) L=INT(TSCLINK_LEN*RANF-0.0001)+1 !code in link 800 J=TSCLINK(L) !code of transient sub cell IF( J .LE. JJ*2+1)THEN X=M1TX-JJ+J-1 Y=M1TY-JJ ELSE IF( J .LE. JJ*4+1)THEN X=M1TX+JJ Y=M1TY-3*JJ+J-1 ELSE IF( J .LE. JJ*6+1) THEN X=M1TX+5*JJ-J+1 Y=M1TY+JJ ELSE X=M1TX-JJ Y=M1TY+7*JJ-J+1 END IF IF( JJ .EQ. 0)THEN IF( NTSC(Y,X) .EQ. 2)THEN IF(MPTX .NE. M1TX)THEN M2=ITSC(3-M1SC,M1TY,M1TX) ELSE JJ=JJ+1 !no molecule in this layer, go to next layer DEALLOCATE(TSCLINK) END IF ELSE IF( NTSC(Y,X) .GT. 2)THEN DO WHILE(M2 .EQ. 0) CALL RANDOM_NUMBER(RANF) K=INT(NTSC(Y,X)*RANF-0.0001)+1 !code in transient sub cell M2=ITSC(K,Y,X) !code of molecule IF((M2 .EQ. M1) .OR. (M2 .EQ. MP))THEN M2=0 END IF END DO ELSE JJ=JJ+1 !no molecule in this layer, go to next layer DEALLOCATE(TSCLINK) END IF ELSE K=INT(NTSC(Y,X)*RANF-0.0001)+1 !code in transient sub cell M2=ITSC(K,Y,X) !code of molecule IF( M2 .EQ. MP)THEN M2=0 IF(TSCLINK_LEN .NE. 1)THEN L=L+1 IF(L .GT. TSCLINK_LEN)THEN L=L-2 END IF GOTO 800 END IF JJ=JJ+1 !no molecule in this layer, go to next layer DEALLOCATE(TSCLINK) END IF END IF END IF END DO RETURN END SUBROUTINE SUBROUTINE TRANSIENT_SUBCELL_DEL IMPLICIT NONE DEALLOCATE(ITSC,IPTSC,NTSC) NITSC=0 RETURN END SUBROUTINE END MODULE
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What happens with
SUBROUTINE TRANSIENT_SUBCELL_DEL IMPLICIT NONE if(allocated(ITSC)) then dealocate(ITSC) else print *,"ITSC not allocated" endif if(allocated(IPTSC)) then dealocate(IPTSC) else print *,"IPTSC not allocated" endif if(allocated(NTSC)) then dealocate(NTSC) else print *,"NTSC not allocated" endif NITSC=0 RETURN END SUBROUTINE
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In response to your private message.
Try to reduce the number of allocate/deallocate
!$omp parallel ... ... if(allocated(YourThreadPrivateArray)) then if(size(YourThreadPrivateArray) .lt. SizeYouWant) then deallocate(YourThreadPrivateArray) allocate(YourThreadPrivateArray(SizeYouWant), STAT=ISTAT) if(ISTAT != 0) stop ! handle error here endif else allocate(YourThreadPrivateArray(SizeYouWant), STAT=ISTAT) if(ISTAT != 0) stop ! handle error here endif ... !$omp end parallel ...
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I removed my earlier reply about the stack overflow - that's not what's happening...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the problem is stack overflow (good possibility) then the error should not have occurred on the DEALLOCATE, rather it would be expected on a statement that creates a temporary array (when not needed) such as on
ITSC(1:NITSC-1,:,:)=ITSC_BACKUP
Since the DEALLOCATE immediately follows the above statement, it could vary well be a mis-generated line number in the error report.
Note, for a long time IVF has been noted for creating array temporaries when not necessary.
*** in the case above case, the temporary is likely created because the first index was extended, and your copy operation is performed in chunks. In the above case (at least until array temporary is eliminated) try:
! ITSC(1:NITSC-1,:,:)=ITSC_BACKUP DO I=1,SIZE(ITSC, DIM=2) DO J=1,SIZE(ITSC, DIM=3) ITSC(1:NITSC-1,I,J)=ITSC_BACKUP(1:NITSC-1,I,J) END DO END DO
(Change I and J accordingly)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error really is happening in the DEALLOCATE. It's an access violation in the library routine. I'm asking the developers to check it out.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What's really maddening is that the error comes and goes randomly. A common problem with parallel applications...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
What happens with
SUBROUTINE TRANSIENT_SUBCELL_DEL IMPLICIT NONE if(allocated(ITSC)) then dealocate(ITSC) else print *,"ITSC not allocated" endif if(allocated(IPTSC)) then dealocate(IPTSC) else print *,"IPTSC not allocated" endif if(allocated(NTSC)) then dealocate(NTSC) else print *,"NTSC not allocated" endif NITSC=0 RETURN END SUBROUTINEJim Dempsey
Dear Jim, I have tried this earlier, but actually, the program crash down without any warning or information. I have even add some codes to give the array a value, there is no error happening.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
What's really maddening is that the error comes and goes randomly. A common problem with parallel applications...
Dear Steve, I'm happy for your help since I find you are familiar with OpenMP and Fortran. When the program is compiled in release mode with out debug information. It just crashed with out any information. When compiled with debug information and use VS2012 to debug, the program will break at deallocate.
By the way, this problem does not always happen. When the maximum number of index is small, it runs pretty well. I have also tried to increase the reserved heap size in the project property, it seems it could run longer ( I guess it but don't really counts)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I find that the error happens about half the time when I run the program. This complicates trying to find a workaround. I have asked the developers to investigate - issue ID is DPD200257093.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would it be worth trying to remove the allocate out of the !$OMP region? An approach could be something like below, where the array for each thread is allocated outside the loop and you use the thread number to identify the array section you use:
[fortran]
program omptest
use omp_lib
implicit none
INTEGER :: I,k,J, n
integer,pointer :: b(:,:)
n = OMP_GET_MAX_THREADS() - 1
allocate (b (67214259,0:n) )
!$OMP PARALLEL DO PRIVATE (i,k), SHARED (b)
DO I=1,2490000
!
k=OMP_get_thread_num()
call use_thread_b ( b(:,k) )
!
END DO
!$OMP END PARALLEL DO
END PROGRAM[/fortran]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our investigation so far has uncovered a missing synchronization when deallocating large allocations that end up using VirtualAlloc. But fixing that revealed more issues, so investigation continues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We believe that we've fixed the issues this test program uncovered, both on Windows and Linux, though they were different. Thanks for alerting us to the problem. I expect that the fix will appear in the 15.0 release later this year. On Windows the problem is seen if you do a lot of large allocations and deallocations in a threaded environment.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page