Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Large arrays in Fortran OpenMP

iterater
Beginner
1,353 Views
Nowadays I'm trying to apply OpenMP to existing Fortran program. But I've faced with following issue: in case program work with large arrays even recompiling with /Qopenmp parameter product non-working executable file. (I use Intel FORTRAN Compiler 9.1) Here is little example of what I mean:

program md
implicit none
integer n
integer m
parameter(n=300,m=500)
real*8 array2d(n,m)
integer i
integer j
do i=1,n
do j=1,m
array2d(i,j) = i + j
enddo
enddo
end

This program works correct, but after compilation with /Qopenmp parameter it causes stack overflow exception. I suppose this is connected with size of array because decreasing quantity of elements in it allows avoiding this error. I've tried to change KMP_STACKSIZE but it doesn't help.
Also I've tested this source code with Linux FORTRAN Compiler and get the same result.
Have anyone faced with this kind of problem? Could You help me?
Thanks in advance.
0 Kudos
9 Replies
Steven_L_Intel1
Employee
1,352 Views
You probably also need to raise the stack reserve size in the linker options. On Linux, raise your stacksize limit ("ulimit -s" or "limit stacksize unlimited"). I tried this on Windows and it worked.
0 Kudos
iterater
Beginner
1,352 Views
Thanks a lot. Linker parameters (/STACK) help to overcome this problem. "ulimit -s" is also successful solution, but only on Linux. Do you know if there are commands of this kind on Windows? It would be very helpful.
0 Kudos
TimP
Honored Contributor III
1,352 Views
0 Kudos
iterater
Beginner
1,352 Views
No, I mean tool by using which it's possible to change stack limit available for processes. E.g. ulimit -s sets maximum stack size available to the shell and to processes started by it. So setting this value to larger one or to "unlimited" allows to avoid stack overflow. But I'm not sure if there is such a command for Windows.
0 Kudos
Steven_L_Intel1
Employee
1,352 Views
Unfortunately, there isn't. Windows uses a 1970s-era method where the stack is allocated entirely by the linker when the executable is linked. When the EXE is run, the size is picked up and the stack created, and that's all you get. There is no concept of automatic stack expansion as in operating systems from the 1980s.

When you use /Qopenmp, one of the side effects of that is to imply /auto - that is, all local variables, even arrays, are allocated on the stack (other than those with SAVE semantics). This means that an OpenMP application almost always needs a larger than default stack allocation. You can set it at link time or use EDITBIN as Tim suggests to change the value in an EXE, but there is no process-level control over this.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,352 Views

iterator,

In your program is your intention to have array2d shared amongst threads or seperate copies for each thread?

If shared then try something like

module md_data
integer, parameter :: n = 300
integer, parameter :: m = 500
real(8) :: array2d(n,m)
end module md_data

program md
use md_data
implicit none
integer i
integer j
!$omp parallel do private(i,j)
do i=1,n
do j=1,m
array2d(i,j) = i + j
enddo
enddo
!$omp end parallel do
end

Jim Dempsey

0 Kudos
iterater
Beginner
1,352 Views
Steve,
Thank you for good explanation. It was very helpful.

JimDempseyAtTheCove,
Yes, now I'm modifying my program in this way. But mentioned problem arose before any code modification with adding OpenMP directives (that source code was just for instance). Now the problem is solved. Thanks to all.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,352 Views

Iterater,

Hint for you.

You will find that Stack pressure increases with number of threads. As you have experienced, code that used to work fine in single threaded environment may run into problems in multi-threaded environments. Shared items can be moved from the stack into module space (or COMMON) as you have indicated you are in the process of doing.

Another area of concern is some subroutines require the use of temporary arrays. You can specify that the local arrays are to be allocated on the heap but this may add unnecessary overhead (performing allocate and deallocate). To eliminate this overhead I use ThreadPrivate storage to contain a pointer to a user defined structure that contains the list of temporary arrays. Upon entry to the subroutine I get the thread private pointer to the list of temporary arrays then query the size of the array of interest to see if it is large enough to perform the current operation. If not large enough (or not allocated in the case of first time) the current array is deallocated and a new one is allocated. In this manner each thread's temporary array (for the subroutine) gets sized and resized as necessary. The overhead to de-reference the ThreadPrivate pointer is much less than the allocation/deallocation.

Jim Dempsey

0 Kudos
iterater
Beginner
1,352 Views
I've faced with another problem. Described test example works correct under windows and linux both. But when I work with real program there still are segmentation fault errors. But errors appear not before running program (like in previous case) but during program execution. ulimit -s does'n help to solve this problem now. There exist linker parameter --stack mentioned in 'man ld' page under linux (i suppose it's the same like /STACK for windows linker), but it works only for i386pe. So I can't use it (linker reject this option while trying to apply).

PS: Under Windows everything works ok (I use /STACK linker option)
0 Kudos
Reply