Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Stack Overflow during running parallel FORTRAN code

Mohammadreza_S_
Beginner
1,369 Views

Hi

I have a code in FORTRAN and it runs sequentially without problem (I compile it with /O3 and x64 platform). Then I add OpenMp syntaxes to make the code more optimized. This time it gives me "Stack overflow" message (even if I run it ). I increased stack reserve size to about 1GB but it does not work.

Here is part of the code that change to make it parallel: 

    call OMP_SET_NUM_THREADS(6);
    !$OMP PARALLEL DO DEFAULT(PRIVATE) SHARED(g_num,g_coord,nn,nels,anatyp)   &   
    !$OMP SHARED(coord_elm_center,loc_ele_cor,wix4,der4,fun4,Shear_Skeleton)  &
    !$OMP SHARED(EleMode,v,vu,Biot_Coef,c,dtim,permx,Kr,cT,gam,omg2,gcor8)    &
    !$OMP SHARED(wix8,eqn,counter1,counter2,counter3,counter4,counter5,Th_Exp)&
    !$OMP SHARED(lan,lan1,der8,fun8,gcor20,wix20,fun20,der20,gcor40,wix40)    &
    !$OMP SHARED(fun40,der40,gcor61,wix61,fun61,der61) SCHEDULE(DYNAMIC)      &
    !$OMP REDUCTION (+:Lhs,LhsSig,LhsU,A15,A25,A35,A45,A55_Heat)
    Main: do iel=1, nels
 $  DO SOEM CALCULATION

    enddo Main;
    !$OMP END PARALLEL DO 

I appreciate any help.

 

0 Kudos
2 Solutions
Steven_L_Intel1
Employee
1,369 Views

You need to set the environment variable KMP_STACK_SIZE to a larger value (but not 1GB!) - this is the per-thread stack size. I suggest a somewhat lower stack reserve size - try 100000000 to begin with.

View solution in original post

0 Kudos
Michael_Roberts
New Contributor I
1,369 Views

Hi,

I had this issue when I had a large, private, array. When entering the parallel zone a local copy was created on each threads stack causing the overflow.

My solution was to allocate the array dynamically before the parallel section, with an additional dimension, allocated to the number of threads. This can then be a shared array between all threads (ie no copy created on stack) where each thread accesses its own slice using the function OMP_GET_THREAD_NUM() + 1  (where the '+1' is because this function is zero based, not one based).

 

 

View solution in original post

0 Kudos
6 Replies
Steven_L_Intel1
Employee
1,370 Views

You need to set the environment variable KMP_STACK_SIZE to a larger value (but not 1GB!) - this is the per-thread stack size. I suggest a somewhat lower stack reserve size - try 100000000 to begin with.

0 Kudos
Mohammadreza_S_
Beginner
1,369 Views

Hi Lionel

Can you help me how to do that?

 

0 Kudos
Michael_Roberts
New Contributor I
1,370 Views

Hi,

I had this issue when I had a large, private, array. When entering the parallel zone a local copy was created on each threads stack causing the overflow.

My solution was to allocate the array dynamically before the parallel section, with an additional dimension, allocated to the number of threads. This can then be a shared array between all threads (ie no copy created on stack) where each thread accesses its own slice using the function OMP_GET_THREAD_NUM() + 1  (where the '+1' is because this function is zero based, not one based).

 

 

0 Kudos
Chris_G_2
Beginner
1,369 Views

I had exactly the same problem recently, and solved it in a similar way as Michaael Roberts.

Chris G

0 Kudos
Mohammadreza_S_
Beginner
1,369 Views

Thank you Michael, Chris and Steve. I have increased KMP_STACKSIZE to 999M but It does not solve the problem. I think the best way is the way that Michael describes. I will do this and inform you. Thanks.

0 Kudos
TimP
Honored Contributor III
1,369 Views

KMP_STACKSIZE (or, using the standard name, OMP_STACKSIZE) defaults to 4MB on Intel 64-bit targets.  A typical setting, when default isn't sufficient, is 9MB. When Steve said don't use 1GB I doubt he meant 999MB.  I haven't heard of any application where more than 40MB is required.  You ought to be able to estimate how much space is required for your private arrays by multiplying data size by number threads.

When you set KMP_STACKSIZE=999MB you risk adding 1GB times number of threads to the allowance you would require in /link /stack, which would put a low limit on number of threads. I don't know specifically for your platform, but I wouldn't count on being able to increase effective stack reserve to as much as 16GB (note that Steve suggested a more modest value).
 

0 Kudos
Reply