Allocation of pointers in subroutines when using OpenMP failed.

Xichen_Li · ‎12-23-2011

[fortran]    module MyModule
        type :: MyType
            integer, dimension(:), pointer :: pIntList => null ()
        end type 
    end module MyModule

    program TestOpenMP_1_AllocationWithinSubroutines

        use MyModule
        use omp_lib
        implicit none

        type(MyType) :: instance
        integer :: threadId 
        integer :: I

!$omp parallel private (instance, threadId, I) 
        threadId = OMP_GET_THREAD_NUM()
        allocate (instance%pIntList(200 + threadId))
        CALL IterateList(instance) 
!$omp end parallel

        read (*,*)

    contains

        subroutine IterateList(aInstance)
            type(MyType) :: aInstance
            integer, dimension(:), pointer :: pTempIntList => null()
            integer :: threadId
            threadId = OMP_GET_THREAD_NUM()
            allocate (pTempIntList(size(aInstance%pIntList)))
            write (*,*) 'Thread ',threadId, ' - ', 'aInstance%pIntList: ', size(aInstance%pIntList)
            write (*,*) 'Thread ',threadId, ' - ', 'pTempIntListe: ', size(pTempIntList)         
        end subroutine 

    end program TestOpenMP_1_AllocationWithinSubroutines
[/fortran]

As the above code sample shows, the main program tries to utilize OpenMP to call a subroutine. In that subroutine, a local pointer variable is created (allocated). Its size depends on the actual argument. The output shows the local pointer variable does NOT have the same size as the actual argument, which however should not be the case based on the code.

Could you help to comment why the program gives such behavior? More importantly, could you help to comment the correct ways (best practices) concerning how to do allocation of pointers in subroutines when using OpenMP?

The version of Intel Fortran Compiler is:

[root@localhost new]# ifort --version
ifort (IFORT) 12.1.0 20111011
Copyright (C) 1985-2011 Intel Corporation.  All rights reserved.


The output using ifort is shown below:
[root@localhost new]# ifort omp.f90 -warn -check -g -trace -openmp -static
[root@localhost new]# ./a.out 
 Thread            5  - aInstance%pIntList:          205
 Thread            5  - pTempIntListe:          207
 Thread            1  - aInstance%pIntList:          201
 Thread            1  - pTempIntListe:          207
 Thread            3  - aInstance%pIntList:          203
 Thread            3  - pTempIntListe:          207
 Thread            0  - aInstance%pIntList:          200
 Thread            0  - pTempIntListe:          207
 Thread            2  - aInstance%pIntList:          202
 Thread            2  - pTempIntListe:          207
 Thread            4  - aInstance%pIntList:          204
 Thread            4  - pTempIntListe:          207
 Thread            7  - aInstance%pIntList:          207
 Thread            7  - pTempIntListe:          207
 Thread            6  - aInstance%pIntList:          206
 Thread            6  - pTempIntListe:          207

Ron_Green · ‎01-05-2012

we will investigate this and report back soon.

ron

jimdempseyatthecove · ‎01-10-2012

Try using:

recursive subroutineIterateList(aInstance)

Compiling with openmp enabled should have effectively done this for you.
Reason being, you want the array descriptor for pTempIntListe to be on stack as opposed to SAVE.

If this fixes the problem, then the bug is -openmp not making local arrays stack based (i.e. leaving as SAVE)

Jim Dempsey

Xichen_Li · ‎01-11-2012

Thank you for your comment! Sorry but adding recursive doesn't help. After all, as you already mention, -openmp should make subroutines recursive automatically.

jimdempseyatthecove · ‎01-11-2012

Try adding

subroutine IterateList(aInstance)

type(MyType) :: aInstance

integer, dimension(:), pointer :: pTempIntList => null()

integer :: threadId

threadId = OMP_GET_THREAD_NUM()
!$OMP CRITICAL

write(*,*) 'loc(aInstance)', loc(aInstance)
write(*,*) 'associated(aInstance%pIntList)', associated(aInstance%pIntList)
allocate (pTempIntList(size(aInstance%pIntList)))

write (*,*) 'Thread ',threadId, ' - ', 'aInstance%pIntList: ', size(aInstance%pIntList)

write (*,*) 'Thread ',threadId, ' - ', 'pTempIntListe: ', size(pTempIntList)

!$OMP END CRITICAL
end subroutine

--------------
If that does not disclose anything, then try

volatile integer :: myTurn

myTurn = 0
!$omp parallel private (instance, threadId, I)

threadId = OMP_GET_THREAD_NUM()

do while(myTurn .lt. threadId)
sleepqq(1)
end do
allocate (instance%pIntList(200 + threadId))

CALL IterateList(instance)

myTurn = myTurn + 1
!$omp end parallel

The second test will show if the array size is stored in a shared location (if the size printed out follows size expected).

This information may aid in finding the bug (be it in your program or in IVF)

Jim Dempsey

jimdempseyatthecove · ‎01-11-2012

Also, does this issue show up in Debug build (with no optimizations)?

Jim Dempsey

pbkenned1 · ‎01-13-2012

It's not an optimisation bug, nor a bug with -openmp not putting the array descriptor for pTempIntListon the stack. I verified that each thread gets a unique allocation for pTempIntList in the subroutine.

I believe this is a known bug with the Fortran front-end not putting arguments to size() on the private OpenMP list, in this case, aInstance%pIntList and pTempIntList.

Workaround is to put a critical section in the subroutine around the thread-specific code:

!$omp critical

threadId = OMP_GET_THREAD_NUM()

allocate (pTempIntList(size(aInstance%pIntList)))

write (*,*) 'Thread ',threadId, ' - ', 'aInstance%pIntList: ', size(aInstance%pIntList)

write (*,*) 'Thread ',threadId, ' - ', 'pTempIntListe: ', size(pTempIntList)

!$omp end critical

>ifort -Qopenmp U101975-critical.f90

Intel Visual Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.1.2.278 Build 20111128

>U101975-critical.exe

Thread 2 - aInstance%pIntList: 202

Thread 2 - pTempIntListe: 202

Thread 0 - aInstance%pIntList: 200

Thread 0 - pTempIntListe: 200

Thread 1 - aInstance%pIntList: 201

Thread 1 - pTempIntListe: 201

Thread 3 - aInstance%pIntList: 203

Thread 3 - pTempIntListe: 203

>

Incidently, gfortran has a similar issue (with your original code); but the reason could be entirely different:

> gfortran --version

GNU Fortran (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]

> gfortran -fopenmp U101975_gfortran.f90 && ./a.out

Thread 0 - aInstance%pIntList: 200

Thread 0 - pTempIntListe: 201

Thread 2 - aInstance%pIntList: 202

Thread 2 - pTempIntListe: 201

Thread 1 - aInstance%pIntList: 201

Thread 1 - pTempIntListe: 201

Thread 3 - aInstance%pIntList: 203

Thread 3 - pTempIntListe: 201

>

Once the developers confirm it's a known bug, I'll update this thread with the tracking number.

Patrick Kennedy
Intel Developer Support

jimdempseyatthecove · ‎01-13-2012

If size(foo) is a Fortran compileroptimization issue relating to private variables then adding a critical section should not affect the code.

However, if the critical section corrects the size(foo) issue, then this is indicative that the ALLOCATE(foo) code is inserting the allocation size into the out of private scope array descriptor allocation size location.

IOW you have two problems:

size(foo)uses allocation sizeof wrong descriptor
ALLOCATE(foo) is inserting allocation size into wrong descriptor

When looking at one problem, be aware of, and look for, other problem.

Jim Dempsey

pbkenned1 · ‎01-13-2012

Indeed, therecould be multiple defects in play here. The Fortran developers were unable to confirm if this a known issue, so this has been reported as a new defect, tracking number DPD200177907. I'll keep this thread updated with news from the developers.

Patrick Kennedy

pbkenned1 · ‎11-12-2014

After a long investigation, it turns out this is a coding error. Jim was on the right track when he said "you want the array descriptor for pTempIntListe to be on stack as opposed to SAVE.". However, it's not the case that " the bug is -openmp not making local arrays stack based (i.e. leaving as SAVE)".

The explicit initialization of pTempIntList inside routine IterateList to null() forces it to be static, by Fortran language rules:

See 12-007 F2008 section 5.2.3 p3 n[89:12-13]

    Explicit initialization of a variable that is not in a common block

    implies the SAVE attribute, which may be confirmed by explicit

    specification.

But when we're compiling this routine, there is no way that the FE can determine that the routine might be called from inside a parallel region in some other piece of code. As a result, we can't detect the error (i.e. using a static variable inside a parallel region - which is not thread-safe) and give a nice message.

The fix is simply to remove the initialization:

        subroutine IterateList(aInstance)
            type(MyType) :: aInstance
            integer, dimension(:), pointer :: pTempIntList

With that change, all is good:

[U101975]$ ifort -openmp U101975.f90 -o U101975.x
[U101975]$ export OMP_NUM_THREADS=4
[U101975]$ ./U101975.x
Thread            0 - aInstance%pIntList:          200
Thread            0 - pTempIntListe:          200
Thread            1 - aInstance%pIntList:          201
Thread            1 - pTempIntListe:          201
Thread            2 - aInstance%pIntList:          202
Thread            2 - pTempIntListe:          202
Thread            3 - aInstance%pIntList:          203
Thread            3 - pTempIntListe:          203
[U101975]$

Patrick

jimdempseyatthecove · ‎11-12-2014

ooh, yea, I should have seen that. I will remember this for next time. FWIW this should be a reminder to NOT initialize the variables in the declaration section when (if) the subroutine or function will ever be used in openmp or recursively.

Thanks Patrick for explaining this in further detail.

This would be a good example for one of those "what's wrong with this program" questions.

Jim Dempsey

Steven_L_Intel1 · ‎11-12-2014

The compiler could give a "usage informational" if it sees local variable initialization when options imply that the routine should be recursive (and in F2015, all routines are recursive by default.)

jimdempseyatthecove · ‎11-13-2014

That would be great. Especially for those wanting to newer semantics. This diagnostic would be valuable in trying a newer level semantic (2008, 2015, ...) and being notified of statements to be examined for potential problems.

>>(and in F2015, all routines are recursive by default.)

Will this change the rule:

Explicit initialization of a variable that is not in a common block implies the SAVE attribute, which may be confirmed by explicit specification.

Jim Dempsey

Steven_L_Intel1 · ‎11-13-2014

No, it doesn't change that rule.

I'll comment that we don't have options to specify "level" of standard semantics. The compiler has a particular level it considers current. What the change will do is cause more incorrect programs with uninitialized variables to show errors.