Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Stack size is not big enought

Christoph_I_
Beginner
3,821 Views

Hello!
I have an issue regarding stacksize and ifort (Parallel Studio XE Composer). For my computations I divide an huge array into small pieces and every node submitted to the run does the computations for one part. Below a chunk of code:

real         ::  p(200,200,400)
integer      ::  ib,ie,jb,je,kb,ke 
...
ib=1;ie=199
jb=2;je=198
kb=2;ke=398
call  SOLVE_POI_EQ(rank,p(ib:ie,jb:je,kb:ke),R)

The problem here is that when I reduce the number of nodes, the code crashes with an `Segmentation Fault` when I call `SOLVE_POI_EQ`. I use linux and when I set the stack size to unlimited: `ulimit -s unlimited` it works.

I'm now worried that I overwrite parts of my OS (can that happen?)!

Is there a better way to address this issue?

0 Kudos
24 Replies
jimdempseyatthecove
Honored Contributor III
3,060 Views

You will never overwrite parts of the O/S. Your application runs in Virtual Memory.

Fortran has an option to place large local objects in the heap. -heap-arrays [size] (Linux), /heap-arrays[:size] (Windows)

Jim Dempsey

 

0 Kudos
Christoph_I_
Beginner
3,060 Views

Thanks! That works fine. But how about the runtime, will the code be slowed down when I use the -heap-arrays?

0 Kudos
SergeyKostrov
Valued Contributor II
3,060 Views
This is simply a generic comment related to some issues with a size of stack... I recently tried to set a size of the stack for a 64-bit application compiled with Intel C++ compiler ( Windows version ) greater then 1 GB and Intel C++ compiler Not allowed to set it. So, as you can see the size of a stack can Not be unlimited.
0 Kudos
SergeyKostrov
Valued Contributor II
3,060 Views
>>...will the code be slowed down when I use the -heap-arrays? No, it will not. However, if ALL your data sets do not fit a Physical Memory then Virtual Memory will be used and performance of computations will be affected.
0 Kudos
Bernard
Valued Contributor I
3,060 Views

At least you can somehow compensate for the "slowness" of virtual memory by using SSD drive for swap page.

0 Kudos
Bernard
Valued Contributor I
3,060 Views

Should have written "swap space"

0 Kudos
Bernard
Valued Contributor I
3,060 Views

>>>I'm now worried that I overwrite parts of my OS (can that happen?)!>>>

As Jim said it will not happen.

I think that in your case running program simply consumed all allocated stack space thus triggering segmentation fault when addresses range being referenced where mapped to reserved or uncommited memory.

0 Kudos
Christoph_I_
Beginner
3,060 Views

Perfect so no reacent to worry about killing my os or slowing the code down. Thanks!

0 Kudos
Bernard
Valued Contributor I
3,060 Views

>>>Perfect so no reacent to worry about killing my os or slowing the code down. Thanks!>>>

Unless you will start writing kernel code and overwrite somehow critical structures:)

0 Kudos
SergeyKostrov
Valued Contributor II
3,060 Views
>>...At least you can somehow compensate for the "slowness" of virtual memory... Actually, I "fight" with that by increasing a Working Set Size ( WSS ) for my set of Windows applications and spent lots of time trying to understand what is the best value for the WSS. It is a really tricky thing. It looks like an application needs as bigger as possible "window" to the Virtual Memory and that is wrong. My evaluations show that some middle-like values for the WSS are better.
0 Kudos
Bernard
Valued Contributor I
3,060 Views

Hi Sergey,

Can you write what settings are you using for WSS?

0 Kudos
Bernard
Valued Contributor I
3,060 Views

@Sergey

Do you manipulate WSS programatically?

0 Kudos
SergeyKostrov
Valued Contributor II
3,060 Views
>>Can you write what settings are you using for WSS? After many tests I decided to set the Working Set Size to 384MB and this is the value when I saw some performance improvements.
0 Kudos
SergeyKostrov
Valued Contributor II
3,060 Views
Yes, below is a very simple piece of codes that sets WSS: ... dwMinPwsSize = 384 * 1024 * 1024; dwMaxPwsSize = 384 * 1024 * 1024; if( ::SetProcessWorkingSetSize( hProcess, dwMinPwsSize, dwMaxPwsSize ) == 0 ) { CrtPrintf( RTU("SetProcessWorkingSetSize - Failed\n") ); break; } ... Take into account, that in order to set WSS for a process two Access Rights flags ( PROCESS_SET_QUOTA and PROCESS_QUERY_INFORMATION ) need to be set before a call to SetProcessWorkingSetSize.
0 Kudos
SergeyKostrov
Valued Contributor II
3,060 Views
Here is some additional follow up: >>...when I reduce the number of nodes, the code crashes with an `Segmentation Fault`... When something is wrong with the stack an error message always has the word Stack in it. So, Segmentation Fault message means for me: - A memory block was Not allocated because it was too big ( Requested Size Of Memory > Available Size Of Memory ( Physical + Virtual ) ) - The application did Not verify that a pointer to that memory block had NULL value - Processing continued and the application crashed with the error message '...Segmentation Fault...' A very simple way to prevent such errors is to set a Virtual Memory Maximum Size limit to as higher as possible value. For example, on one of my computers the Maximum Size limit is set to 192GB ( the system has 32GB of Physical memory ). It guarantees that for most cases I deal with the memory will be always allocated. Of course, all pointers need to be verified before processing continues and an algorithm needs to handle all cases when pointers have NULL after some memory allocation was requested.
0 Kudos
SergeyKostrov
Valued Contributor II
3,060 Views
Many classic well-known algorithms are Recursive and they always use Stack during processing. When it comes to Big Data processing they fail and the only way to solve that problem is to make all memory requests to the Heap. This is Not a simple task but it is Not too hard, or unsolvable, task. I followed that concept and re-implemented Merge Sort algorithm to sort very big data sets and I called my version as Merge Sort Adaptive ( MSA ). During final phase of merging two already sorted sub-sets of data MSA uses the Heap instead of the Stack as soon as some limit is reach ( for example, 512MB, or higher ). Another example is Strassen Matrix Multiplication algorithm. It is also recursive and in case of very large matrices ( greater than 32K x 32K ) the amount of memory allocated on the Stack is huge. In order to solve that problem and to make the algorithm more flexible I switched all memory requests to the Heap.
0 Kudos
TimP
Honored Contributor III
3,060 Views

Intel advice from not too long ago recommended against using heap inside OpenMP parallel regions.  Then it seems there have been some efforts made to help out with this combination.  So I don't know the latest word.

0 Kudos
SergeyKostrov
Valued Contributor II
3,060 Views
>>...Should have written "swap space" I don't think so.
0 Kudos
jimdempseyatthecove
Honored Contributor III
3,060 Views

Sergey,

You can also have an allocation that succeeds (returns non-null pointer), and then have the error occur later when the allocation is use as a first time use, which then causes a page fault should the page file become exhausted. IOW allocation works - program fails due to lack of resources.

Jim Dempsey

0 Kudos
SergeyKostrov
Valued Contributor II
2,921 Views
>>You can also have an allocation that succeeds (returns non-null pointer), and then have the error occur later when >>the allocation is use as a first time use, which then causes a page fault should the page file become exhausted. >>IOW allocation works - program fails due to lack of resources. Never had that issue even if some algorithms I've implemented use lots of memory ( up to 192GB / physical + virtual ).
0 Kudos
Reply