Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29274 Discussions

compatibility problem with OpenMP

hougj
Débutant
851 Visites
hi guys,

I'm currently trying to parallelize a scientific code written in F90 with OpenMP. One thing really bothered me recently is that this code is working well if compiled with gfortran 4.1.2, but if compiled with Intel Fortran(for Windows ifort V11.0.072 and Linux ifort V10.1) V11, the program will throw out:
1. "Segmentation error" in Linux, or
2. "Stack overflow"(forrtl severe 170) in Visual Fortran Release mode, or
3. "Debug assertion failed! ... File: winsig.c, Line: 419 Expression: ("Invalid signal or error", 0)..." in Visual Fortran Debug mode

What I want to know is the difference between the OpenMP in ifort and gfortran, so that I can know what to do with my code, as it seems that my code compiled with ifort (in "serial" mode) is 2 times faster that gfortran...

Thanks in advance,
GH
0 Compliments
7 Réponses
Steven_L_Intel1
Employé
851 Visites
Looks as if you're running out of stack - easy to do, especially on Windows. Try adding -heap-arrays to the compile options to see if that helps. You can also set the stack size larger in the linker properties on Windows or with a "limit" or "ulimit" command on Linux.
0 Compliments
hougj
Débutant
851 Visites
Looks as if you're running out of stack - easy to do, especially on Windows. Try adding -heap-arrays to the compile options to see if that helps. You can also set the stack size larger in the linker properties on Windows or with a "limit" or "ulimit" command on Linux.
Thanks Steve!
As you suggested, I tried to add /heap-arrays- in Visual Fortran and -heap-arrays in Linux Intel Fortran, my Linux ifort compile command is:
ifort -openmp -o ip.out -D __GFORTRAN__ -heap-array [src files here]
Unfortunately what you suggested didn't work for me. I checked the memory usage in Windows for single thread, it's about 5MB, I'm using dual-core, so the max memory usage won't exceed 10MB if it's parallelized, which is not much, especially when compared with my Safari browser, which is taking almost 300MB memory...
Any other suggestions?
Thanks!
0 Compliments
Steven_L_Intel1
Employé
851 Visites
On Windows, the default stack size is 1MB! You can raise this by setting a larger value in Properties > Linker > System > Stack Reserve Size. I suggest starting with 100000000 (100MB, thereabouts).
0 Compliments
hougj
Débutant
851 Visites
On Windows, the default stack size is 1MB! You can raise this by setting a larger value in Properties > Linker > System > Stack Reserve Size. I suggest starting with 100000000 (100MB, thereabouts).
Hi Steve,
I tried to increase stack size to a larger number, and the error prompt did changed, but just another pop dialog of "stack flow".
Here is a very basic code, and I had the same severe 170 for this one. The code I'm posting here has same idea as the one I'm doing: large array + parallel. Will you help me to determine an appropriate stack/heap size for this one?
Thank you so much!
GH


program c1
implicit none
real(kind=8) :: a(1:1000,1:1000)
integer ::i,j
!$omp parallel do default(none) private(a,i,j)
do i=1, 1000
do j=1,1000
a(i,j)=0
a(i,j)=i*j
end do
end do
!$omp end parallel do
end program
0 Compliments
Andrew_Smith
Précieux contributeur I
851 Visites
Since you have made the array A thread private you will need stack space for A in every thread. Therefore about 8Mb per thread plus 8Mb for the declared array. So a quad core would need 40 Mb stack plus a bit more for overheads.

At the end of the calculation, the original array A will not be initialised. I suspect it would work better if A was not private.
0 Compliments
jimdempseyatthecove
Contributeur émérite III
851 Visites
Quoting - hougj
Hi Steve,
I tried to increase stack size to a larger number, and the error prompt did changed, but just another pop dialog of "stack flow".
Here is a very basic code, and I had the same severe 170 for this one. The code I'm posting here has same idea as the one I'm doing: large array + parallel. Will you help me to determine an appropriate stack/heap size for this one?
Thank you so much!
GH


program c1
implicit none
real(kind=8) :: a(1:1000,1:1000)
integer ::i,j
!$omp parallel do default(none) private(a,i,j)
do i=1, 1000
do j=1,1000
a(i,j)=0
a(i,j)=i*j
end do
end do
!$omp end parallel do
end program

The above code sample is likely not what you intended. I believe this is what you intended

program c1
implicit none
real(kind=8) :: a(1:1000,1:1000)
integer ::i,j
!$omp parallel do default(none) shared(a) private(i,j)
do i=1, 1000
do j=1,1000
a(i,j)=0
a(i,j)=i*j
end do
end do
!$omp end parallel do
end program

(be sure to use heap arrays too)

In your original code you created (at least attempted to create) multiple blocks of data for thread specific array of a(1:1000,1:1000). Then perform a parallel do to perform a slice of the array aby each thread, but in this case, each slice is not of the same array but of different instances for each thread.

Jim Dempsey
0 Compliments
hougj
Débutant
851 Visites
Quoting - hougj
hi guys,

I'm currently trying to parallelize a scientific code written in F90 with OpenMP. One thing really bothered me recently is that this code is working well if compiled with gfortran 4.1.2, but if compiled with Intel Fortran(for Windows ifort V11.0.072 and Linux ifort V10.1) V11, the program will throw out:
1. "Segmentation error" in Linux, or
2. "Stack overflow"(forrtl severe 170) in Visual Fortran Release mode, or
3. "Debug assertion failed! ... File: winsig.c, Line: 419 Expression: ("Invalid signal or error", 0)..." in Visual Fortran Debug mode

What I want to know is the difference between the OpenMP in ifort and gfortran, so that I can know what to do with my code, as it seems that my code compiled with ifort (in "serial" mode) is 2 times faster that gfortran...

Thanks in advance,
GH
Thank you all for helping me with my problem!
I just worked it out. I turned the biggest array (which is really big) to "allocatable", and latter allocate it before it's used. So (I think so...) the operating system wouldn't try to allocate stack or heap for this array during the very beginning.
Currently it's still not working with my 32bit Windows, but OK on 64bit Linux.
Thanks again!
0 Compliments
Répondre