Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

compatibility problem with OpenMP

hougj
Beginner
847 Views
hi guys,

I'm currently trying to parallelize a scientific code written in F90 with OpenMP. One thing really bothered me recently is that this code is working well if compiled with gfortran 4.1.2, but if compiled with Intel Fortran(for Windows ifort V11.0.072 and Linux ifort V10.1) V11, the program will throw out:
1. "Segmentation error" in Linux, or
2. "Stack overflow"(forrtl severe 170) in Visual Fortran Release mode, or
3. "Debug assertion failed! ... File: winsig.c, Line: 419 Expression: ("Invalid signal or error", 0)..." in Visual Fortran Debug mode

What I want to know is the difference between the OpenMP in ifort and gfortran, so that I can know what to do with my code, as it seems that my code compiled with ifort (in "serial" mode) is 2 times faster that gfortran...

Thanks in advance,
GH
0 Kudos
7 Replies
Steven_L_Intel1
Employee
847 Views
Looks as if you're running out of stack - easy to do, especially on Windows. Try adding -heap-arrays to the compile options to see if that helps. You can also set the stack size larger in the linker properties on Windows or with a "limit" or "ulimit" command on Linux.
0 Kudos
hougj
Beginner
847 Views
Looks as if you're running out of stack - easy to do, especially on Windows. Try adding -heap-arrays to the compile options to see if that helps. You can also set the stack size larger in the linker properties on Windows or with a "limit" or "ulimit" command on Linux.
Thanks Steve!
As you suggested, I tried to add /heap-arrays- in Visual Fortran and -heap-arrays in Linux Intel Fortran, my Linux ifort compile command is:
ifort -openmp -o ip.out -D __GFORTRAN__ -heap-array [src files here]
Unfortunately what you suggested didn't work for me. I checked the memory usage in Windows for single thread, it's about 5MB, I'm using dual-core, so the max memory usage won't exceed 10MB if it's parallelized, which is not much, especially when compared with my Safari browser, which is taking almost 300MB memory...
Any other suggestions?
Thanks!
0 Kudos
Steven_L_Intel1
Employee
847 Views
On Windows, the default stack size is 1MB! You can raise this by setting a larger value in Properties > Linker > System > Stack Reserve Size. I suggest starting with 100000000 (100MB, thereabouts).
0 Kudos
hougj
Beginner
847 Views
On Windows, the default stack size is 1MB! You can raise this by setting a larger value in Properties > Linker > System > Stack Reserve Size. I suggest starting with 100000000 (100MB, thereabouts).
Hi Steve,
I tried to increase stack size to a larger number, and the error prompt did changed, but just another pop dialog of "stack flow".
Here is a very basic code, and I had the same severe 170 for this one. The code I'm posting here has same idea as the one I'm doing: large array + parallel. Will you help me to determine an appropriate stack/heap size for this one?
Thank you so much!
GH


program c1
implicit none
real(kind=8) :: a(1:1000,1:1000)
integer ::i,j
!$omp parallel do default(none) private(a,i,j)
do i=1, 1000
do j=1,1000
a(i,j)=0
a(i,j)=i*j
end do
end do
!$omp end parallel do
end program
0 Kudos
Andrew_Smith
Valued Contributor I
847 Views
Since you have made the array A thread private you will need stack space for A in every thread. Therefore about 8Mb per thread plus 8Mb for the declared array. So a quad core would need 40 Mb stack plus a bit more for overheads.

At the end of the calculation, the original array A will not be initialised. I suspect it would work better if A was not private.
0 Kudos
jimdempseyatthecove
Honored Contributor III
847 Views
Quoting - hougj
Hi Steve,
I tried to increase stack size to a larger number, and the error prompt did changed, but just another pop dialog of "stack flow".
Here is a very basic code, and I had the same severe 170 for this one. The code I'm posting here has same idea as the one I'm doing: large array + parallel. Will you help me to determine an appropriate stack/heap size for this one?
Thank you so much!
GH


program c1
implicit none
real(kind=8) :: a(1:1000,1:1000)
integer ::i,j
!$omp parallel do default(none) private(a,i,j)
do i=1, 1000
do j=1,1000
a(i,j)=0
a(i,j)=i*j
end do
end do
!$omp end parallel do
end program

The above code sample is likely not what you intended. I believe this is what you intended

program c1
implicit none
real(kind=8) :: a(1:1000,1:1000)
integer ::i,j
!$omp parallel do default(none) shared(a) private(i,j)
do i=1, 1000
do j=1,1000
a(i,j)=0
a(i,j)=i*j
end do
end do
!$omp end parallel do
end program

(be sure to use heap arrays too)

In your original code you created (at least attempted to create) multiple blocks of data for thread specific array of a(1:1000,1:1000). Then perform a parallel do to perform a slice of the array aby each thread, but in this case, each slice is not of the same array but of different instances for each thread.

Jim Dempsey
0 Kudos
hougj
Beginner
847 Views
Quoting - hougj
hi guys,

I'm currently trying to parallelize a scientific code written in F90 with OpenMP. One thing really bothered me recently is that this code is working well if compiled with gfortran 4.1.2, but if compiled with Intel Fortran(for Windows ifort V11.0.072 and Linux ifort V10.1) V11, the program will throw out:
1. "Segmentation error" in Linux, or
2. "Stack overflow"(forrtl severe 170) in Visual Fortran Release mode, or
3. "Debug assertion failed! ... File: winsig.c, Line: 419 Expression: ("Invalid signal or error", 0)..." in Visual Fortran Debug mode

What I want to know is the difference between the OpenMP in ifort and gfortran, so that I can know what to do with my code, as it seems that my code compiled with ifort (in "serial" mode) is 2 times faster that gfortran...

Thanks in advance,
GH
Thank you all for helping me with my problem!
I just worked it out. I turned the biggest array (which is really big) to "allocatable", and latter allocate it before it's used. So (I think so...) the operating system wouldn't try to allocate stack or heap for this array during the very beginning.
Currently it's still not working with my 32bit Windows, but OK on 64bit Linux.
Thanks again!
0 Kudos
Reply