Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Cannot run the code with openmp-stack overflow

Arm_N_
Beginner
1,241 Views

Hi everyone,

I am having trouble just running the code.  What I did is that I just change the intel environment to be ready to run openmp. (Generate Parallel Code \Qopenmp)  Even WITHOUT writing anything related to omp in my code, i.e., "USE omp_lib, CALL set_num_threads, !$omp pararell do", I have Stack Overflow Error.  That means this also happens after I type all omp directives as well.

I tried to set stack_reserve_size, but it does not work.  Either stack over flow error, or the runtime screen just pops up and disappears if stack size is too large.

Any suggestion ?  I am using IA-32 platform and I am not sure whether it is due to this.

Thanks.

0 Kudos
7 Replies
Xiaoping_D_Intel
Employee
1,241 Views

Can you try enlarging the OpenMP thread private stack size by setting environment variable "OMP_STACKSIZE" to a larger value? Its detail description can be found at https://software.intel.com/en-us/node/680054

 

Thanks,

Xiaoping Duan

Intel Customer Support

0 Kudos
TimP
Honored Contributor III
1,241 Views

64-bit mode is helpful in increasing limits on omp_stacksize and stack reserve, as is avoiding excessive number of threads. Default omp_stacksize in 32-bit mode is 2M. You may need to increase both omp_stacksize and stack reserve, and limit threads to 1 per core, along with set omp_places=cores.  One of the frequently reported mistakes is ridiculously large settings of omp_stacksize which would exceed stack reserve if running more than 1 thread. 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,241 Views

How many threads are you setting with set_num_threads?

Note, the default is 1 thread per logical processor. Tim P's typical application runs best with 1 thread per core, whereas other applications may run better on HT enabled systems using more, but generally not more than number of logical processors. Using more threads than logical processors is called oversubscription. There are a few cases where oversubscription is beneficial.

It is a good practice to learn how to use OpenMP by using the default settings. Then, after your program is working, you can experiment with tweaking the control knobs.

It would be helpful for you to examine the memory requirements of your pre-openmp program to determine the code, heap and stack requirements. If your pre-openmp ran close to your memory limit then there are a few things you might be able to do.

By the way. On a 32-bit platform, as an example with 8 GiB of RAM, the available memory for any single process application is 4 GiB. However, the default settings for Windows is to partition the 4 GiB Virtual Address space into two 2 GiB partitions: the low partition for user program, heap and stack, and the upper 2 GiB for System.

One of your options to increase memory for a process (depending if your version of O/S permits this) is the /3GB Startup Switch. See: https://technet.microsoft.com/en-us/library/bb124810(v=exchg.65).aspx

This instructs the O/S to divvy up the process (VM) space into 3GB for user and 1GB for O/S. If available to you, this may work.

A second option, which can be used in conjunction with the first... let me explain something first:

In OpenMP, all thread stack sizes will be set to the same size if you use the OMP_STACKSIZE. A potential problem with this (don't worry there is a work around), is that due to prior experience in programming, perhaps from your CS class, is that you may have adopted the practice of placing all of your data on the stack (after all, stack is faster allocation than heap). The problem with this is when you parallelize your application, you typically have two classes of objects: those shared amongst threads and those private to each thread.

Think about this: If all threads stack sizes are the same, and if the shared objects are stack based (in master thread), then each additional thread will require the total size of the shared objects in there stacks, even though this space will not be used.

The solution to this is relatively simple. Make the shared objects ALLOCATABLE or SAVE or COMMON or in MODULE. If ALLOCATABLE, have the main thread allocate before you use the data.

Jim Dempsey

0 Kudos
Arm_N_
Beginner
1,241 Views

Hi everyone,

Thanks for every reply.  Sorry for just having time to follow this.

I have not written anything.  Just set up the "Generate Parallel Code \Qopenmp" and stackoverflow came. 

I tried to set OMP_STACKSIZE but I did not know the syntax or command to set this.

I tried           export OMP_STACKSIZE="10M"

I tried            OMP_STACKSIZE="10M"

I tried         CALL OMP_STACKSIZE(10000000)

I tried         set OMP_STACKSIZE=16M

None of these can be compiled.  

I tried Call KMP_SET_STACKSIZE(2000000000)

This can be compiled but the problem stackoverflow still persists.

Sorry for asking a silly question, but I tried looking everywhere how to set omp_stacksize and tried all that.  

Any suggestion ?

Thank you very much.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,241 Views

>>Even WITHOUT writing anything related to omp in my code i.e., "USE omp_lib, CALL set_num_threads, !$omp pararell do", I have Stack Overflow Error

Then I suspect that there is something wrong with your installation.

Does the build crash .OR. does the built application (without OpenMP directives and without use omp_lib) crash?

Jim Dempsey

0 Kudos
Arm_N_
Beginner
1,241 Views

Hi Jim,

The build you talked about is a compilation process, right?  If it is, that does not crash, but the run time screen (after running the program) crashes.

I just found one way to get around with this, but still cannot run when I incorporate openmp directives.

I set Project --> Properties --> Linker --> System --> Stack Reserve Size = 200000000 and the program can run with CALL OMP_set_num_threads(OMP_get_max_threads()-1),  

(Previously it cannot run at all if I did not do this.)

but when I put !$OMP PARALLEL DO then the stackoverflow comes again.

Thanks for any further suggestion.

 

 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,241 Views

200MB is rather excessive, especially prior to first parallel region.

Is your program by chance

program
  ... (non-OpenMP)
  call SomeSubroutineWithHugeStackRequirement
  ... (non-OpenMP)
end program
...
subroutine SomeSubroutineWithHugeStackRequirement
    (data declarations with huge stack)
    !$omp parallel private(aboveHugeDataDeclarations)
   ...

If so, you have now increased the stack requirements.

Try setting /heap-arrays:0

Jim Dempsey

 

0 Kudos
Reply