Question on setting stack size KMP_SET_STACKSIZE_S

danielsue · ‎05-27-2013

Hi All,

I would like to set the stack size at runtime for my parallel project. The stack size (e.g., 100000000) was previously set through property/linker/system/stack reserve size and stack commit size, and it works fine.

But when I set the stack size at runtime, it can not work and throw out "stack overflow". The codes are as follows:

stack_size = "estimate stack size" (e.g., 100000000)

call KMP_SET_STACKSIZE_S(stack_size)

write(idbg, *) "set stack size to: ", stack_size

stack_size_check = KMP_GET_STACKSIZE_S()

write(idbg, *) "check stack size:", stack_size_check

The result shows that the stack size has been set to 100000000, but still cause stack overflow problem.

What's wrong with the above codes?

Thanks and regards,

Daniel

TimP · ‎05-27-2013

The overall stack size for your job has to accommodate the sum of the thread stack sizes set by KMP_STACKSIZE plus other stuff. So, if you set the same size for thread stack and overall stack, you won't be able to run even one thread. As number of threads continually increases, it's important to find ways to avoid increasing thread stack size; we can easily need 1GB overall even with default (4MB for 64-bit mode) thread stacks. Even an 8GB thread stack is likely to limit the number of threads.

danielsue · ‎05-27-2013

TimP (Intel) wrote:

The overall stack size for your job has to accommodate the sum of the thread stack sizes set by KMP_STACKSIZE plus other stuff. So, if you set the same size for thread stack and overall stack, you won't be able to run even one thread. As number of threads continually increases, it's important to find ways to avoid increasing thread stack size; we can easily need 1GB overall even with default (4MB for 64-bit mode) thread stacks. Even an 8GB thread stack is likely to limit the number of threads.

Hi Tim,

Thanks.

The size of stack for each parallel thread depends on our problem. We usually need more than 100M for large problem. So far, it is not easy to reduce the stack size. KMP_SET_STACKSIZE_S Sets the number of bytes that will be allocated for each parallel thread to use as its private stack.

I have two questions:

1) How can I set the overall stack size at runtime?

2) Is the stack reserve/commit size in properties page the overall stack?

Thanks and regards,

Daniel

SergeyKostrov · ‎05-27-2013

You didn't specify on what platform you're doing all your tests. >>...The stack size (e.g., 100000000) was previously set through property/linker/system/stack reserve size and >>stack commit size, and it works fine... 1. Stack Commit / Reserve are defined for the application ( it doesn't affect a stack size of OpenMP threads ) 2. KMP_STACKSIZE or OMP_STACKSIZE are set for OpenMP threads 3. KMP_STACKSIZE or OMP_STACKSIZE could be set at runtime using CRT-function putenv ( to verify getenv needs to be used )

danielsue · ‎05-27-2013

Sergey Kostrov wrote:

You didn't specify on what platform you're doing all your tests.

>>...The stack size (e.g., 100000000) was previously set through property/linker/system/stack reserve size and
>>stack commit size, and it works fine...

1. Stack Commit / Reserve are defined for the application ( it doesn't affect a stack size of OpenMP threads )
2. KMP_STACKSIZE or OMP_STACKSIZE are set for OpenMP threads
3. KMP_STACKSIZE or OMP_STACKSIZE could be set at runtime using CRT-function putenv ( to verify getenv needs to be used )

Thanks so much. Currently we use windows and will move to linux later.

Daniel

jimdempseyatthecove · ‎05-28-2013

Daniel,

You must call KMP_SET_STACKSIZE_S (or OMP_..., or set the envirenment variable)... **before** entering your first parallel region. IOW, the stack size is enforced upon thread creation. Note, the setting of the stack size will not affect the main thread's stack size. Therefore the overflow problem you are observing may be due to the main thread not having sufficient stack space (this can be confirmed by observing that only the main thread exhibits the stack overflow). The issue with the main thread can be resolved by:

a) keeping the main thread (app) stack size small, set the other threads stack size large (then enter first parallel region), avoid using main thread for large stack consuming tasks
b) making the main thread (app) stack size overly large, setting stack size for other threads smaller (then enter first parallel region). Use all threads for large stack consuming tasks. Note, on 32-bit system, this may result in waste (unutilized) space on the main thread's stack. Some of this can be reclaimed by use of shell subroutines in place of ALLOCATE. IOW the main program starts, before first parallel region determine working stack size, specify working stack size for use on thread creation... then if main thread stack size too large call subroutine that allocates some buffers on stack and then calls main code, if main thread stack size too small, set flag to have main thread avoid stack intensive tasks, if main thread stack size appropriate, call subroutine that uses ALLOCATE to allocate some buffers from heap.

The Commit and Reserve only affect the initial page file allocation (and wipe if so enabled). This will not affect the virtual address space claimed by each thread.

32-bit memory intensive programs may require some inventive programming. At some point you might consider using MPI or using a multi-process application (perhaps using memory mapped file for shared data). Or simply reducing the number of threads to work within your memory budget.

Jim Dempsey

danielsue · ‎05-28-2013

Hi Jim,

Thanks for your expatiation.

The problem seems quite similar with those "Stack size in fortran using OpenMP". If I compile the codes without OPENMP, I do not need to increase the stack size at all. The default size (4M) is enough even for large problem. But if I compile the codes with OPENMP, I shall increase the Commit and Reserve stack to 8000000 (8M) to make the program run with 4 threads.

Almost all the large datasets are allocatalbe and decalared in a module, some of the datasets (dataset A) are global shared and others are used as local varialbes (dataset B) shared by some subroutines. For the parallel region, I decalare the dataset B as private and pass dataset B to the subroutines and the program works fine if I increase the value of "Commit and Reserve stack".

Since the stack size in sequential mode (compile without OPENMP) do not need to be increased, does this mean that the stack size for my main thread is not large? If so, why KMP_SET_STACKSIZE_S does not solve my problem as this function is used to set the stack size for parallel threads. I do call this function before the parallel region.

Thanks for your time,

Regards,

Daniel

SergeyKostrov · ‎05-29-2013

>>>>KMP_STACKSIZE or OMP_STACKSIZE could be set at runtime using CRT-function putenv ( to verify getenv >>>>needs to be used ) >>... >>...Currently we use windows and will move to linux later... A solution that changes settings for these OpenMP environment variables and based on these CRT-functions is a portable one ( should work on any platform ).

jimdempseyatthecove · ‎05-29-2013

>> I shall increase the Commit and Reserve stack to 8000000 (8M) to make the program run with 4 threads.

You are confused!

When running eith 4 threads, your application has 4 stacks. Each stack needs to be sufficiently large to handle worst case. If 1 thread can run your application with 4MB, then each thread should be able to run the application using 4MB each. However, there may be some additional space required depending upon how you use the PRIVATE clause and REDUCE clause.

Check your code to see if you went bonkers in attempt to use a high level of recursive nested parallelism.

Jim Dempsey

jimdempseyatthecove · ‎05-29-2013

When a little bit of parallelism is good...
This does not mean a lot of parallelism is better.

Jim Dempsey

danielsue · ‎05-29-2013

Hi All,

Thanks for all your help. I am a little confused.

Let me decribe the problem again to make it clear.

1. The platform is windows x64, Intel Parallel Studio XE2013 + Visual Studio 2010 Pro.

2. If compiled without OPENMP, everything works fine and I do not need to set the stack size, neither do I need to set stack reserve/commit size.

3. If compiled with OPENMP, I need to increase the stack reserve/commit size (e.g., 10M) to run intensive case with 4. If I don't increase stack reserve/commit size, the program will run into "stack overflow".

3. The stack overflow shall come from a parallel region with a lot of firstprivate variables. In this region, there is no nested parallelism. The estimated space is 6M for these firstprivate variables for intensive case. If I comment out this parallel region, it works fine and I don't need to increase stack size.

I have also tried to manually set OMP_STACKSIZE (e.g., 10M) in the environment variable page, but it does not solve my problem. At present, the problem is solved by setting the "stack reserve/commit size" in the page project property->Linker->System.

Another question is about KMP_SET_STACKSIZE_S and putenv or setenv. What's the difference between them since the former also sets the number of bytes that will be allocated for each parallel thread to use as its private stack, as described in the follow webpage.

http://www.ncsa.illinois.edu/UserInfo/Resources/Software/Intel/Compilers/10.0/main_for/mergedProjects/lref_for/source_files/rfropemp.htm

Thanks and regards,

Daniel

TimP · ‎05-29-2013

private scalars don't consume much stack, but private arrays not only consume signficant stack but may cut performance. Even private scalars may inhibit optimization if they don't get optimized away as they might without OpenMP.

SergeyKostrov · ‎05-29-2013

>>...Another question is about KMP_SET_STACKSIZE_S and putenv or setenv. What's the difference between them... putenv CRT-function is a POSIX function ( _putenv is actually a recommended replacement by one of ISO C++ standards ) and I don't see any references for setenv in header files for several C++ compilers.

jimdempseyatthecove · ‎05-30-2013

>>I have also tried to manually set OMP_STACKSIZE (e.g., 10M) in the environment variable page, but it does not solve my problem. At present, the problem is solved by setting the "stack reserve/commit size" in the page project property->Linker->System.

This means that the main thread (PROGRAM) has insufficient stack. Using the linker setting declares the main thread's stack size (and sets the default stack size for subsiquent OpenMP threads). As stated a few times earlier, the KMP_SET_STACKSIZE_S and other OpenMP means to set stack size, only effect the stack size of subsequently created threads (used by OpenMP).

Consider enabling heap_arrays (compiler option), or making the large arrays you (currently have on stack) allocatable but do not allocate until you enter the parallel region. Use FIRSTPRIVATE to copy the un-ALLOCATED array descriptors (due to bug/oversight in earlier compilers).

REAL :: YourBigArrayToBePrivate(Bigsize)
REAL, ALLOCATABLE :: YourSubstituteArray(:)
...
!$OMP PARALLEL FIRSTPRIVATE( YourSubstituteArray)
ALLOCATE(YourSubstituteArray(LBOUND(YourBigArrayToBePrivate):UBOUND(YourBigArrayToBePrivate))
YourSubstituteArray = YourBigArrayToBePrivate
!$OMP DO...
...
DEALLOCATE(YourSubstituteArray)
!$OMP END PARALLEL

Note, should you have enabled realloc lhs the allocate/deallocate could be omitted.

Jim Dempsey