Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Stackoverflow Fortran openMP

Peter_F_5
Beginner
901 Views

I am trying to run an OpenMP Fortran code using Intel Fortran. 

At the beginning of my code I allocate 7 very large matrices:

    ! Allocate 
    REAL, ALLOCATABLE :: m1(:,:,:,:,:,:,:,:,:)
    REAL, ALLOCATABLE :: m2(:,:,:,:,:,:,:,:,:)
    REAL, ALLOCATABLE :: m3(:,:,:,:,:,:,:,:,:)
    REAL, ALLOCATABLE :: m4(:,:,:,:,:,:,:,:,:)
    REAL, ALLOCATABLE :: m5(:,:,:,:,:,:,:,:,:)
    REAL, ALLOCATABLE :: m6(:,:,:,:,:,:,:,:,:)
    REAL, ALLOCATABLE :: m7(:,:,:,:,:,:,:,:,:)
    
    ALLOCATE(m1(2,161,20,2,2,21,30,2,2))   
    ALLOCATE(m2(2,161,20,2,2,21,30,2,2))   
    ALLOCATE(m3(2,161,20,2,2,21,30,2,2))   
    ALLOCATE(m4(2,161,20,2,2,21,30,2,2))   
    ALLOCATE(m5(2,161,20,2,2,21,30,2,2))   
    ALLOCATE(m6(1,161,20,2,2,21,30,2,2))          
    ALLOCATE(m7(1,161,20,2,2,21,30,2,2))   

I then run a code with a big parallelized loop. 

    !$omp parallel do default(private) shared(m1, m2, m3, m4, m5, m7,  someothervariables)


Some of the matrices `m1` to `m7` are indeed used in subroutines, which might lead to creation of temporary arrays. 
In any case the code runs fine if done serially. But it crashes with the following error if run with openMP regardless of the number of cores I am using:

The machine I am using has `128GB` of ram. Not sure if this is the limiting factor or not. If I decrease the last index of each matrix to 1, the code runs fine in 24 cores. Given that I am increasing the size of the memory used by 2, it should run in 12 cores, at least no? 

Maybe I am doing some big error, or perhaps is just some Fortran option that needs to be changed ...


 

0 Kudos
6 Replies
IanH
Honored Contributor II
901 Views

OpenMP programs tend to use much more stack - temporaries need to be created for each thread, and as each thread has an independent stack, that is a convenient place for the compiler to put those temporaries.

What stack size have you specified for the link of your program?  Perhaps it needs to be bigger (the default is quite small in the context of today's programs).

Are you using /heaparrays:0 ?  If not, try it - the compiler will then use the heap for temporaries, significantly reducing stack space requirements.

0 Kudos
Peter_F_5
Beginner
901 Views

ianh wrote:

OpenMP programs tend to use much more stack - temporaries need to be created for each thread, and as each thread has an independent stack, that is a convenient place for the compiler to put those temporaries.

What stack size have you specified for the link of your program?  Perhaps it needs to be bigger (the default is quite small in the context of today's programs).

Are you using /heaparrays:0 ?  If not, try it - the compiler will then use the heap for temporaries, significantly reducing stack space requirements.

 

Thanks for the reply. These are the options I am using which I think are exactly what you mentioned:

0 Kudos
IanH
Honored Contributor II
901 Views

How about the /heap-arrays setting (note I spelt it wrong)?

 

heaparrays.PNG

0 Kudos
Peter_F_5
Beginner
901 Views

ianh wrote:

How about the /heap-arrays setting (note I spelt it wrong)?

 

I will try that. As of now apparently setting Stack Commit size to 999999999 solved the problem. Not sure why. Just need to double check but it seems to be the case.

0 Kudos
TimP
Honored Contributor III
901 Views
Thread stack (default 4mb per thread) counts against stack reserve.
0 Kudos
jimdempseyatthecove
Honored Contributor III
901 Views

>>Thread stack (default 4mb per thread) counts against stack reserve.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774(v=vs.85).aspx
https://docs.microsoft.com/en-us/cpp/build/reference/stack-stack-allocations
 

The above MS links do an abysmal job of defining just what the stack reserve setting does with respect to a multi-threaded program. Maybe you can comment on this further Tim P.

My interpretation, which is unfounded by any information provided by someone that really knows, is the reserve size is the amount (size) of virtual address space to sequester for use by stack(s) allocations. Note, this is not the same as specifying a stack size. Also note that there is no description of just what happens between program start and as it adds and removes threads. IOW how the reserved address space is wacked up as you add threads.

My assumption (completely unfounded) is the second thread gets half of the reserve space (as reserve) diminishing the reserve space of the first thread in the process. The third thread gets half of the largest of the two existing reserved spaces diminishing the reserve space of the thread who's reserve space was taken, the forth thread gets half of the largest of the existing reserved spaces diminishing the reserve space of the thread who's reserve space was taken, etc...  Actual stack allocation from the page file (or physical RAM) occurs only when a/any thread touches a previously untouched page of the virtual address space.

What are your thoughts?

Jim Dempsey

0 Kudos
Reply