Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29262 Discussions

Allocation of pointers in subroutines when using OpenMP failed.

Xichen_Li
Beginner
2,138 Views

[fortran]    module MyModule
        type :: MyType
            integer, dimension(:), pointer :: pIntList => null ()
        end type 
    end module MyModule

    program TestOpenMP_1_AllocationWithinSubroutines

        use MyModule
        use omp_lib
        implicit none

        type(MyType) :: instance
        integer :: threadId 
        integer :: I

!$omp parallel private (instance, threadId, I) 
        threadId = OMP_GET_THREAD_NUM()
        allocate (instance%pIntList(200 + threadId))
        CALL IterateList(instance) 
!$omp end parallel

        read (*,*)

    contains

        subroutine IterateList(aInstance)
            type(MyType) :: aInstance
            integer, dimension(:), pointer :: pTempIntList => null()
            integer :: threadId
            threadId = OMP_GET_THREAD_NUM()
            allocate (pTempIntList(size(aInstance%pIntList)))
            write (*,*) 'Thread ',threadId, ' - ', 'aInstance%pIntList: ', size(aInstance%pIntList)
            write (*,*) 'Thread ',threadId, ' - ', 'pTempIntListe: ', size(pTempIntList)         
        end subroutine 

    end program TestOpenMP_1_AllocationWithinSubroutines
[/fortran]
As the above code sample shows, the main program tries to utilize OpenMP to call a subroutine. In that subroutine, a local pointer variable is created (allocated). Its size depends on the actual argument. The output shows the local pointer variable does NOT have the same size as the actual argument, which however should not be the case based on the code.

Could you help to comment why the program gives such behavior? More importantly, could you help to comment the correct ways (best practices) concerning how to do allocation of pointers in subroutines when using OpenMP?

The version of Intel Fortran Compiler is:

[root@localhost new]# ifort --version
ifort (IFORT) 12.1.0 20111011
Copyright (C) 1985-2011 Intel Corporation.  All rights reserved.


The output using ifort is shown below:

[root@localhost new]# ifort omp.f90 -warn -check -g -trace -openmp -static
[root@localhost new]# ./a.out 
 Thread            5  - aInstance%pIntList:          205
 Thread            5  - pTempIntListe:          207
 Thread            1  - aInstance%pIntList:          201
 Thread            1  - pTempIntListe:          207
 Thread            3  - aInstance%pIntList:          203
 Thread            3  - pTempIntListe:          207
 Thread            0  - aInstance%pIntList:          200
 Thread            0  - pTempIntListe:          207
 Thread            2  - aInstance%pIntList:          202
 Thread            2  - pTempIntListe:          207
 Thread            4  - aInstance%pIntList:          204
 Thread            4  - pTempIntListe:          207
 Thread            7  - aInstance%pIntList:          207
 Thread            7  - pTempIntListe:          207
 Thread            6  - aInstance%pIntList:          206
 Thread            6  - pTempIntListe:          207


0 Kudos
13 Replies
Ron_Green
Moderator
2,138 Views
we will investigate this and report back soon.

ron
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,138 Views

Try using:

recursive subroutineIterateList(aInstance)

Compiling with openmp enabled should have effectively done this for you.
Reason being, you want the array descriptor for pTempIntListe to be on stack as opposed to SAVE.

If this fixes the problem, then the bug is -openmp not making local arrays stack based (i.e. leaving as SAVE)

Jim Dempsey

0 Kudos
Xichen_Li
Beginner
2,138 Views
Thank you for your comment! Sorry but adding recursive doesn't help. After all, as you already mention, -openmp should make subroutines recursive automatically.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,138 Views
Try adding

subroutine IterateList(aInstance)

type(MyType) :: aInstance

integer, dimension(:), pointer :: pTempIntList => null()

integer :: threadId

threadId = OMP_GET_THREAD_NUM()
!$OMP CRITICAL

write(*,*) 'loc(aInstance)', loc(aInstance)
write(*,*) 'associated(aInstance%pIntList)', associated(aInstance%pIntList)
allocate (pTempIntList(size(aInstance%pIntList)))

write (*,*) 'Thread ',threadId, ' - ', 'aInstance%pIntList: ', size(aInstance%pIntList)

write (*,*) 'Thread ',threadId, ' - ', 'pTempIntListe: ', size(pTempIntList)

!$OMP END CRITICAL
end subroutine


--------------
If that does not disclose anything, then try

volatile integer :: myTurn

myTurn = 0
!$omp parallel private (instance, threadId, I)

threadId = OMP_GET_THREAD_NUM()

do while(myTurn .lt. threadId)
sleepqq(1)
end do
allocate (instance%pIntList(200 + threadId))

CALL IterateList(instance)

myTurn = myTurn + 1
!$omp end parallel


The second test will show if the array size is stored in a shared location (if the size printed out follows size expected).

This information may aid in finding the bug (be it in your program or in IVF)

Jim Dempsey
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,138 Views
Also, does this issue show up in Debug build (with no optimizations)?

Jim Dempsey
0 Kudos
pbkenned1
Employee
2,138 Views

It's not an optimisation bug, nor a bug with -openmp not putting the array descriptor for pTempIntListon the stack. I verified that each thread gets a unique allocation for pTempIntList in the subroutine.

I believe this is a known bug with the Fortran front-end not putting arguments to size() on the private OpenMP list, in this case, aInstance%pIntList and pTempIntList.

Workaround is to put a critical section in the subroutine around the thread-specific code:

!$omp critical

threadId = OMP_GET_THREAD_NUM()

allocate (pTempIntList(size(aInstance%pIntList)))

write (*,*) 'Thread ',threadId, ' - ', 'aInstance%pIntList: ', size(aInstance%pIntList)

write (*,*) 'Thread ',threadId, ' - ', 'pTempIntListe: ', size(pTempIntList)

!$omp end critical

>ifort -Qopenmp U101975-critical.f90

Intel Visual Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.1.2.278 Build 20111128

>U101975-critical.exe

Thread 2 - aInstance%pIntList: 202

Thread 2 - pTempIntListe: 202

Thread 0 - aInstance%pIntList: 200

Thread 0 - pTempIntListe: 200

Thread 1 - aInstance%pIntList: 201

Thread 1 - pTempIntListe: 201

Thread 3 - aInstance%pIntList: 203

Thread 3 - pTempIntListe: 203

>

Incidently, gfortran has a similar issue (with your original code); but the reason could be entirely different:

> gfortran --version

GNU Fortran (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292]

Copyright (C) 2010 Free Software Foundation, Inc.

> gfortran -fopenmp U101975_gfortran.f90 && ./a.out

Thread 0 - aInstance%pIntList: 200

Thread 0 - pTempIntListe: 201

Thread 2 - aInstance%pIntList: 202

Thread 2 - pTempIntListe: 201

Thread 1 - aInstance%pIntList: 201

Thread 1 - pTempIntListe: 201

Thread 3 - aInstance%pIntList: 203

Thread 3 - pTempIntListe: 201

>

Once the developers confirm it's a known bug, I'll update this thread with the tracking number.

Patrick Kennedy
Intel Developer Support

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,138 Views
If size(foo) is a Fortran compileroptimization issue relating to private variables then adding a critical section should not affect the code.

However, if the critical section corrects the size(foo) issue, then this is indicative that the ALLOCATE(foo) code is inserting the allocation size into the out of private scope array descriptor allocation size location.

IOW you have two problems:

size(foo)uses allocation sizeof wrong descriptor
ALLOCATE(foo) is inserting allocation size into wrong descriptor

When looking at one problem, be aware of, and look for, other problem.

Jim Dempsey
0 Kudos
pbkenned1
Employee
2,138 Views

Indeed, therecould be multiple defects in play here. The Fortran developers were unable to confirm if this a known issue, so this has been reported as a new defect, tracking number DPD200177907. I'll keep this thread updated with news from the developers.

Patrick Kennedy

0 Kudos
pbkenned1
Employee
2,138 Views

After a long investigation, it turns out this is a coding error.  Jim was on the right track when he said "you want the array descriptor for pTempIntListe to be on stack as opposed to SAVE.".  However, it's not the case that " the bug is -openmp not making local arrays stack based (i.e. leaving as SAVE)".

The explicit initialization of pTempIntList inside routine IterateList to null() forces it to be static, by Fortran language rules:

See 12-007 F2008 section 5.2.3 p3 n[89:12-13]
    Explicit initialization of a variable that is not in a common block
    implies the SAVE attribute, which may be confirmed by explicit
    specification.

But when we're compiling this routine, there is no way that the FE can determine that the routine might be called from inside a parallel region in some other piece of code.  As a result, we can't detect the error (i.e. using a static variable inside a parallel region - which is not thread-safe) and give a nice message.

The fix is simply to remove the initialization:

        subroutine IterateList(aInstance)
            type(MyType) :: aInstance
            integer, dimension(:), pointer :: pTempIntList
 

With that change, all is good:

[U101975]$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.1.133 Build 20141023
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

[U101975]$ ifort -openmp U101975.f90 -o U101975.x
[U101975]$ export OMP_NUM_THREADS=4
[U101975]$ ./U101975.x
 Thread            0  - aInstance%pIntList:          200
 Thread            0  - pTempIntListe:          200
 Thread            1  - aInstance%pIntList:          201
 Thread            1  - pTempIntListe:          201
 Thread            2  - aInstance%pIntList:          202
 Thread            2  - pTempIntListe:          202
 Thread            3  - aInstance%pIntList:          203
 Thread            3  - pTempIntListe:          203
[U101975]$

Patrick

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,138 Views

ooh, yea, I should have seen that. I will remember this for next time. FWIW this should be a reminder to NOT initialize the variables in the declaration section when (if) the subroutine or function will ever be used in openmp or recursively.

Thanks Patrick for explaining this in further detail.

This would be a good example for one of those "what's wrong with this program" questions.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
2,138 Views

The compiler could give a "usage informational" if it sees local variable initialization when options imply that the routine should be recursive (and in F2015, all routines are recursive by default.)

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,138 Views

That would be great. Especially for those wanting to newer semantics. This diagnostic would be valuable in trying a newer level semantic (2008, 2015, ...) and being notified of statements to be examined for potential problems.

>>(and in F2015, all routines are recursive by default.)

Will this change the rule:

Explicit initialization of a variable that is not in a common block implies the SAVE attribute, which may be confirmed by explicit specification.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
2,138 Views

No, it doesn't change that rule.

I'll comment that we don't have options to specify "level" of standard semantics. The compiler has a particular level it considers current. What the change will do is cause more incorrect programs with uninitialized variables to show errors.

0 Kudos
Reply