- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I would like to set the stack size at runtime for my parallel project. The stack size (e.g., 100000000) was previously set through property/linker/system/stack reserve size and stack commit size, and it works fine.
But when I set the stack size at runtime, it can not work and throw out "stack overflow". The codes are as follows:
stack_size = "estimate stack size" (e.g., 100000000)
call KMP_SET_STACKSIZE_S(stack_size)
write(idbg, *) "set stack size to: ", stack_size
stack_size_check = KMP_GET_STACKSIZE_S()
write(idbg, *) "check stack size:", stack_size_check
The result shows that the stack size has been set to 100000000, but still cause stack overflow problem.
What's wrong with the above codes?
Thanks and regards,
Daniel
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The overall stack size for your job has to accommodate the sum of the thread stack sizes set by KMP_STACKSIZE plus other stuff. So, if you set the same size for thread stack and overall stack, you won't be able to run even one thread. As number of threads continually increases, it's important to find ways to avoid increasing thread stack size; we can easily need 1GB overall even with default (4MB for 64-bit mode) thread stacks. Even an 8GB thread stack is likely to limit the number of threads.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TimP (Intel) wrote:
The overall stack size for your job has to accommodate the sum of the thread stack sizes set by KMP_STACKSIZE plus other stuff. So, if you set the same size for thread stack and overall stack, you won't be able to run even one thread. As number of threads continually increases, it's important to find ways to avoid increasing thread stack size; we can easily need 1GB overall even with default (4MB for 64-bit mode) thread stacks. Even an 8GB thread stack is likely to limit the number of threads.
Hi Tim,
Thanks.
The size of stack for each parallel thread depends on our problem. We usually need more than 100M for large problem. So far, it is not easy to reduce the stack size. KMP_SET_STACKSIZE_S Sets the number of bytes that will be allocated for each parallel thread to use as its private stack.
I have two questions:
1) How can I set the overall stack size at runtime?
2) Is the stack reserve/commit size in properties page the overall stack?
Thanks and regards,
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
You didn't specify on what platform you're doing all your tests.
>>...The stack size (e.g., 100000000) was previously set through property/linker/system/stack reserve size and
>>stack commit size, and it works fine...1. Stack Commit / Reserve are defined for the application ( it doesn't affect a stack size of OpenMP threads )
2. KMP_STACKSIZE or OMP_STACKSIZE are set for OpenMP threads
3. KMP_STACKSIZE or OMP_STACKSIZE could be set at runtime using CRT-function putenv ( to verify getenv needs to be used )
Thanks so much. Currently we use windows and will move to linux later.
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Daniel,
You must call KMP_SET_STACKSIZE_S (or OMP_..., or set the envirenment variable)... **before** entering your first parallel region. IOW, the stack size is enforced upon thread creation. Note, the setting of the stack size will not affect the main thread's stack size. Therefore the overflow problem you are observing may be due to the main thread not having sufficient stack space (this can be confirmed by observing that only the main thread exhibits the stack overflow). The issue with the main thread can be resolved by:
a) keeping the main thread (app) stack size small, set the other threads stack size large (then enter first parallel region), avoid using main thread for large stack consuming tasks
b) making the main thread (app) stack size overly large, setting stack size for other threads smaller (then enter first parallel region). Use all threads for large stack consuming tasks. Note, on 32-bit system, this may result in waste (unutilized) space on the main thread's stack. Some of this can be reclaimed by use of shell subroutines in place of ALLOCATE. IOW the main program starts, before first parallel region determine working stack size, specify working stack size for use on thread creation... then if main thread stack size too large call subroutine that allocates some buffers on stack and then calls main code, if main thread stack size too small, set flag to have main thread avoid stack intensive tasks, if main thread stack size appropriate, call subroutine that uses ALLOCATE to allocate some buffers from heap.
The Commit and Reserve only affect the initial page file allocation (and wipe if so enabled). This will not affect the virtual address space claimed by each thread.
32-bit memory intensive programs may require some inventive programming. At some point you might consider using MPI or using a multi-process application (perhaps using memory mapped file for shared data). Or simply reducing the number of threads to work within your memory budget.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Thanks for your expatiation.
The problem seems quite similar with those "Stack size in fortran using OpenMP". If I compile the codes without OPENMP, I do not need to increase the stack size at all. The default size (4M) is enough even for large problem. But if I compile the codes with OPENMP, I shall increase the Commit and Reserve stack to 8000000 (8M) to make the program run with 4 threads.
Almost all the large datasets are allocatalbe and decalared in a module, some of the datasets (dataset A) are global shared and others are used as local varialbes (dataset B) shared by some subroutines. For the parallel region, I decalare the dataset B as private and pass dataset B to the subroutines and the program works fine if I increase the value of "Commit and Reserve stack".
Since the stack size in sequential mode (compile without OPENMP) do not need to be increased, does this mean that the stack size for my main thread is not large? If so, why KMP_SET_STACKSIZE_S does not solve my problem as this function is used to set the stack size for parallel threads. I do call this function before the parallel region.
Thanks for your time,
Regards,
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> I shall increase the Commit and Reserve stack to 8000000 (8M) to make the program run with 4 threads.
You are confused!
When running eith 4 threads, your application has 4 stacks. Each stack needs to be sufficiently large to handle worst case. If 1 thread can run your application with 4MB, then each thread should be able to run the application using 4MB each. However, there may be some additional space required depending upon how you use the PRIVATE clause and REDUCE clause.
Check your code to see if you went bonkers in attempt to use a high level of recursive nested parallelism.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When a little bit of parallelism is good...
This does not mean a lot of parallelism is better.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Thanks for all your help. I am a little confused.
Let me decribe the problem again to make it clear.
1. The platform is windows x64, Intel Parallel Studio XE2013 + Visual Studio 2010 Pro.
2. If compiled without OPENMP, everything works fine and I do not need to set the stack size, neither do I need to set stack reserve/commit size.
3. If compiled with OPENMP, I need to increase the stack reserve/commit size (e.g., 10M) to run intensive case with 4. If I don't increase stack reserve/commit size, the program will run into "stack overflow".
3. The stack overflow shall come from a parallel region with a lot of firstprivate variables. In this region, there is no nested parallelism. The estimated space is 6M for these firstprivate variables for intensive case. If I comment out this parallel region, it works fine and I don't need to increase stack size.
I have also tried to manually set OMP_STACKSIZE (e.g., 10M) in the environment variable page, but it does not solve my problem. At present, the problem is solved by setting the "stack reserve/commit size" in the page project property->Linker->System.
Another question is about KMP_SET_STACKSIZE_S and putenv or setenv. What's the difference between them since the former also sets the number of bytes that will be allocated for each parallel thread to use as its private stack, as described in the follow webpage.
Thanks and regards,
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
private scalars don't consume much stack, but private arrays not only consume signficant stack but may cut performance. Even private scalars may inhibit optimization if they don't get optimized away as they might without OpenMP.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I have also tried to manually set OMP_STACKSIZE (e.g., 10M) in the environment variable page, but it does not solve my problem. At present, the problem is solved by setting the "stack reserve/commit size" in the page project property->Linker->System.
This means that the main thread (PROGRAM) has insufficient stack. Using the linker setting declares the main thread's stack size (and sets the default stack size for subsiquent OpenMP threads). As stated a few times earlier, the KMP_SET_STACKSIZE_S and other OpenMP means to set stack size, only effect the stack size of subsequently created threads (used by OpenMP).
Consider enabling heap_arrays (compiler option), or making the large arrays you (currently have on stack) allocatable but do not allocate until you enter the parallel region. Use FIRSTPRIVATE to copy the un-ALLOCATED array descriptors (due to bug/oversight in earlier compilers).
REAL :: YourBigArrayToBePrivate(Bigsize)
REAL, ALLOCATABLE :: YourSubstituteArray(:)
...
!$OMP PARALLEL FIRSTPRIVATE( YourSubstituteArray)
ALLOCATE(YourSubstituteArray(LBOUND(YourBigArrayToBePrivate):UBOUND(YourBigArrayToBePrivate))
YourSubstituteArray = YourBigArrayToBePrivate
!$OMP DO...
...
DEALLOCATE(YourSubstituteArray)
!$OMP END PARALLEL
Note, should you have enabled realloc lhs the allocate/deallocate could be omitted.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page