Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP limit

Roman1
New Contributor I
1,068 Views

Hi,

Does anyone know if there is a limit to the size of the reduction variable in OpenMP?

When I compile and run the attached simple program, it always crashes with a stack overflow.  I have tried setting the stack to 1GB (/STACK:1073741824).  I have tried using the heap (/heap-arrays0).  And I have tried calling KMP_SET_STACKSIZE_S.  Nothing worked for me.

The program works as expected, if I make the size of x smaller.

Roman

0 Kudos
13 Replies
TimP
Honored Contributor III
1,068 Views

If you're controlling the thread stack, it's done by environment variable  e.g.  KMP_STACKXIZE=8m or by library call.

The global stack can be set by option e.g.  /link /stack:800000000 or editbin.

/heap-arrays with a number affects only stack allocations of size known at compile time.  I remember Steve Lionel recommending it without a number.

0 Kudos
John_Campbell
New Contributor II
1,068 Views

I think there could be some problem with " x = x + 1.0"

Try the attached changes with/without OMP and see what works. The K loop works for my case without OMP
Stack problems should not be associated with ALLOCATE but could be associated with temporary arrays for the array instructions.

John

0 Kudos
SergeyKostrov
Valued Contributor II
1,068 Views
>>...Does anyone know if there is a limit to the size of the reduction variable in OpenMP? You could try to set a lower value for OpenMP stack size at runtime. For example, to OMP_STACKSIZE=128K, or so. Another questions are do you use 32-bit or 64-bit platform and how much memory is installed on your computer?
0 Kudos
SergeyKostrov
Valued Contributor II
1,068 Views
>>...The global stack can be set by option e.g. /link /stack:800000000... 800000000 =~ 763MB and it looks to much even for a 64-bit platform.
0 Kudos
NotThatItMatters
Beginner
1,068 Views

Global stack size, a Windows limitation, is limited to 2 Gb.  The best I have ever been able to do is use 268435456, which is 2 ^ 28.  It does not matter, Win32 or X64.  As I have been struggling mightily with the app I create, the reduction of stack size is a real headache in older code without COMMON but with huge argument lists for routines.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,068 Views

>> It does not matter, Win32 or X64

If your app is build as 32-bit app, and
If you run on system with 4 hardware threads, and
If you specify 1GB stack, and
If you launch OpenMP with default settings (4 threads),
Then each thread of the app will attempt to obtain 1GB (of the 2GB or 3GB available in VM)
(i.e. app requires 4GB of total stack)

>> reduction of stack size is a real headache in older code without COMMON but with huge argument lists for routines.

1) Use /heap-arrays for ifort
2) Assure that these old routines are compiled with the -openmp opton even though they do not have OpenMP statements.

Jim Dempsey

0 Kudos
SergeyKostrov
Valued Contributor II
1,068 Views
>>...Global stack size, a Windows limitation, is limited to 2 Gb... It is still Not clear what platform Roman is using and I expect this is a 32-bit Windows platform.
0 Kudos
Steven_L_Intel1
Employee
1,068 Views

A couple of comments.  NotThatItMatters is correct that Windows, both 32-bit and 64-bit, limits the stack to less than 2GB. In fact, I usually cite 1GB as the upper limit.

Jim, I would normally defer to you on OpenMP issues, but I think you went a bit astray here. The stack limit one sets in the linker is for the process. Thread stacks come out of that and are sized by OMP_STACK_SIZE (or KMP_STACK_SIZE). One can also call KMP_SET_STACKSIZE_S before the first parallel region to set the thread stack size. It is not correct that if the linker stack size is 1GB that each thread asks for 1GB in stack.

0 Kudos
TimP
Honored Contributor III
1,068 Views

When I last checked it, Intel OpenMP set a default thread stack size of 2MB when in 32-bit mode, 4MB in 64-bit mode.  A 1MB thread stack, as Microsoft used to use, could lead to serious cache associativity problems on older CPUs. As Steve says, those can be increased by OMP/KMP methods, and this (multiplied by number of threads) is taken out of the process stack limit, which typically has to be increased from the default by options such as /link /stack:800000000 or by the editbin tool.

0 Kudos
Roman1
New Contributor I
1,068 Views

Hi,

Thanks for the replies.  In my initial post, I should have mentioned the following:

I am using 64-bit Windows 7, and I am definitely compiling 64-bit executables.  My computer has 20 GB of RAM, so memory should not be an issue.  When my program is compiled without OpenMP, it runs fine.  It only crashes with a stack overflow if /Qopenmp is used.  I have made the stack as large as possible, and tried calling KMP_SET_STACKSIZE_S() with large values, but it did not help.  It seems that for some reason, OpenMP needs additional stack space when doing reduction, and I have reached some kind of upper limit. Like I said in my initial post, if the size of x is smaller, everything works fine.

I would be happy to provide any more additional information.  Has anyone tried compiling my initial program to see if they can reproduce my problem?

Roman

0 Kudos
Steven_L_Intel1
Employee
1,068 Views

I did and could reproduce it. I have asked some of my coworkers, who know more about OpenMP than I do, for their thoughts.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,068 Views

integer :: n ! same as integer(4)
...
n = 140000000 ! 140000000 = 0x08583B00 (fits in 4 bytes)
...
!$ stack_size = n*20 !  140000000 * 20 = 2800000000 = 0xA6E49C00 (as unsigned, this is negative number as signed)

Use "integer(kind=KMP_SIZE_T_KIND) :: n" 

Jim Dempsey

0 Kudos
Roman1
New Contributor I
1,068 Views

This is a followup to the problem I was having. The OpenMP program I was working on was crashing with a stack overflow if the size of the reduction variable was very large. If anyone is interested, the attached file shows how I was able to solve this, by doing the reduction manually.  My solution isn't very pretty, but it does seem to work.

Roman

 

 

0 Kudos
Reply