Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

EM64T Fortran compiler

Eldhuset__Knut
Beginner
798 Views
Hello !I try to run a Fortran program as: ifort prog.f /link /stack:5000000000. The computer has 2 CPUs and total phys memory: 6.29 GB, available phys mem: 5.03 GB (Windows Task Manager). Total paging file is 9.2 GB. I am not able to run the program if it requires more than about 4.19 GB (Seems to be 2**31 +2 GB ?). The error message when I run the program is: forrtl:severe (170): Program Exeception-stack overflow prog.exe 00000000004635B7 prog.exe 000000000040125D prog.exe 000000000040119B prog.exe 0000000000460B7C prog.exe 0000000000448C18 prog.exe 0000000077D5966C I think that there should be enough physical memory, or what do you think ? Kind regards Knut
0 Kudos
12 Replies
Steven_L_Intel1
Employee
798 Views
I think that the stack size on Windows x64 is limited to 2GB or even less. Try adding /heap-arrays to the compile options.
0 Kudos
Eldhuset__Knut
Beginner
798 Views

No, the limit for a 32 bit compiler is 2 GB. As you see when I use the 64 bit compiler I can run a program which requires 4.2 GB memory. With a 64 bit compiler you can have 128 GB, so why do I meet the limit on 4.2 GB when the computer has 6 GB RAM ?

Knut

0 Kudos
Steven_L_Intel1
Employee
798 Views

That's not what I said. You are running out of stack space, not virtual memory. As far as I know, the stack on 64-bit Windows cannot exceed 2GB.

The amount of virtual memory is also limited by your pagefile space.

0 Kudos
Eldhuset__Knut
Beginner
798 Views

How can I run a program that requires more than 4.2 GB memory (the size of the matrices in the program) on a 64-bit computer ? I have Visual Studio 2005 Professional Edition. The paging file is 9.2 GB. Option /heap-arrays does not work, have you another trick ?

Knut

0 Kudos
jimdempseyatthecove
Honored Contributor III
798 Views

Knut,

1) Make the large arrays allocatable (either as allocatable or by pointer).
2) Code such that temporary arrays are not auto-created for operations on your large arrays.

Consider using:

module CommonData
real, allocatable :: A(:,:)
real, allocatable :: B(:,:)
real, allocatable :: C(:,:)
...
end module CommonData
----------------------
program FOO
use CommonData
allocate(A(1234,123456))
allocate(B(1234,123456))
allocate(C(1234,123456))
...
! Avoid using statements creating array temporaries

! Change:
! call Sub(A+B)
! to:
C = A+B
call Sub(C)

...
end program FOO

Note, there are several forms of array expressions that will require the use of array temporaries. Avoid using those forms of expressions.

Although there is a compile time option /heap-arrays
and a runtime option /check:arg_temp_created

Unfortunately there is no diagnostic/warn:array_temp_created

Which means you have to run the application until it crashes due to lack of memory. This doesn't help much if the sensitive expression is in seldom used code. e.g. your customer experiences the problem while you cannot reproduce the problem.

Jim Dempsey

0 Kudos
Eldhuset__Knut
Beginner
798 Views
Here is the Fortran code. The program runs OK input m=8192 and n=31000, but m=8192 and n=32768 make the program crash.
I have an application which makes use of FFTs, therefore n=32768 would be fine for me.

program memory_test
print 66
66 format('Size of matrices (m,n) :',$)
read(*,*)m,n
size=float(m)*n*8*2/1000000.
write(*,*)'Size of matrices in Megabytes :',size
call xmem(m,n)
end


subroutine xmem(m,n)
complex aa(m,n),bb(m,n)
write(*,*)'First loop starts'
do j=1,m
do i=1,n
aa(j,i)=cmplx(sin(float(j)),cos(float(i)))
enddo
enddo
write(*,*)aa(3,3)
write(*,*)'First loop ready'
do j=1,m
do i=1,n
bb(j,i)=cmplx(sin(float(j)),cos(float(i)))
enddo
enddo
write(*,*)'Second loop ready'

do j=1,m
do i=1,n
bb(j,i)=aa(j,i)*conjg(bb(j,i))
enddo
enddo
write(*,*)'Third loop ready'

end


Regards Knut
0 Kudos
jimdempseyatthecove
Honored Contributor III
798 Views

Knut,

To use allocatable arrays:

subroutine xmem(m,n)
complex, allocatable :: aa(:,:),bb(:,:)
allocate(aa(m,n),bb(m,n))
! your code here
deallocate(aa,bb)
end subroutine xmem

If you wish the arrays to persist outside the subroutine then place the declaration inside a module. Then USE the module in the subroutine that performs the allocation as well as in all the other subroutines that reference the arrays (after allocation and population). Or if you prefer, pass the array references (aa and/or bb) down the call levels.

You can also add an optional argument STAT=sv to allocate anddeallocate.Where STAT= is a keyword and sv is an integer variable to receive a status indication. A return of 0 means success.

Remember to deallocate on return. I think the current version of IVF will auto deallocate local allocatable arrays upon return. It won't hurt to explicitly deallocate as this avoids portability issues.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
798 Views

Knut,

Look at the following modifications to your application.

Note that ALLOCATABLE is changed to POINTER for the allocatable arrays.

I had problems with OpenMP in sharing the arrays when declared as ALLOCATABLE but not when declared as POINTER. Bug, if you ask me.

My system has 4 cores but only 2GB. I could run your test using 8192,32768 (4GB) but it took forever as the array size was larger than physical memory. There was no memory fault. (when using POINTER)

Using 8192,8192 illustrates scalability.

Jim Dempsey


program memory_test
print 66
66 format('Size of matrices (m,n) :',$)
read(*,*)m,n
size=float(m)*n*8*2/1000000.
write(*,*)'Size of matrices in Megabytes :',size
call xmem(m,n)
end

subroutine xmem(m,n)
use omp_lib
complex, pointer :: aa(:,:),bb(:,:)
real(8) :: StartTime, EndTime, ElapseTime
integer :: NumberOfThreads
allocate(aa(m,n),bb(m,n))
write(*,*) 'Wipeing array (to commit Virtual Memory)'
StartTime = OMP_GET_WTIME()
aa = cmplx(0.0,0.0)
bb = cmplx(0.0,0.0)
EndTime = OMP_GET_WTIME()
ElapsTime = EndTime - StartTime
write(*,*) 'Run time ', ElapsTime
do NumberOfThreads=1, OMP_GET_MAX_THREADS()
call OMP_SET_NUM_THREADS(NumberOfThreads)
write(*,*)
write(*,*)'NumberOfThreads ', NumberOfThreads
write(*,*)
write(*,*)'First loop starts'
StartTime = OMP_GET_WTIME()
!$OMP PARALLEL DO PRIVATE(i,j)
do j=1,m
do i=1,n
aa(j,i)=cmplx(sin(float(j)),cos(float(i)))
enddo
enddo
!$OMP END PARALLEL DO
EndTime = OMP_GET_WTIME()
ElapsTime = EndTime - StartTime
write(*,*)aa(3,3)
write(*,*)'First loop ready'
write(*,*) 'Run time ', ElapsTime
write(*,*)
StartTime = OMP_GET_WTIME()
!$OMP PARALLEL DO PRIVATE(i,j)
do j=1,m
do i=1,n
bb(j,i)=cmplx(sin(float(j)),cos(float(i)))
enddo
enddo
!$OMP END PARALLEL DO
EndTime = OMP_GET_WTIME()
ElapsTime = EndTime - StartTime
write(*,*)'Second loop ready'
write(*,*) 'Run time ', ElapsTime
write(*,*)
StartTime = OMP_GET_WTIME()
!$OMP PARALLEL DO PRIVATE(i,j)
do j=1,m
do i=1,n
bb(j,i)=aa(j,i)*conjg(bb(j,i))
enddo
enddo
!$OMP END PARALLEL DO
EndTime = OMP_GET_WTIME()
ElapsTime = EndTime - StartTime
write(*,*)'Third loop ready'
write(*,*) 'Run time ', ElapsTime
end do
end
Size of matrices (m,n) :8192,8192
Size of matrices in Megabytes : 1073.742
Wipeing array (to commit Virtual Memory)
Run time 2.677683
NumberOfThreads 1
First loop starts
(0.1411200,-0.9899925)
First loop ready
Run time 12.08051
Second loop ready
Run time 12.17020
Third loop ready
Run time 29.14083
NumberOfThreads 2
First loop starts
(0.1411200,-0.9899925)
First loop ready
Run time 6.223104
Second loop ready
Run time 6.243929
Third loop ready
Run time 15.01080
N
umberOfThreads 3
First loop starts
(0.1411200,-0.9899925)
First loop ready
Run time 4.348078
Second loop ready
Run time 4.384243
Third loop ready
Run time 10.78805
NumberOfThreads 4
First loop starts
(0.1411200,-0.9899925)
First loop ready
Run time 3.618164
Second loop ready
Run time 3.652418
Third loop ready
Run time 8.962400
----------
Summary            loop 1    loop 2   loop 3
NumberOfThreads 1 12.08051 12.17020 29.14083
NumberOfThreads 2 6.223104 6.243929 15.01080
NumberOfThreads 3 4.348078 4.384243 10.78805
NumberOfThreads 4 3.618164 3.652418 8.96240
0 Kudos
Eldhuset__Knut
Beginner
798 Views

Jim

I tried it. It works and I can run the program if it needs more than the physical memory on the computer. It does not crash as it did with my code. The code I used did work in Unix Fortran, I have used it for many years, but I see that I have to rewrite as you tellin Intel Fortran. I can also tell you (you may already know it) that I don't need the options /link and /stack:n any longer. I should have given you the code earlier...

Thank you very much

Knut

0 Kudos
Eldhuset__Knut
Beginner
798 Views

Jim

Thanks a lot. May be that I also need to parallellize my application. I have not tried it yet.

Regards Knut

0 Kudos
jimdempseyatthecove
Honored Contributor III
798 Views

Knut,

You might as well parallelize the code. Virtually any workstation you purchase today will have at least two processing cores, and soon four cores. Same with notebooks.

Jim Dempsey

0 Kudos
Eldhuset__Knut
Beginner
798 Views

Jim

I have tried the simple test code which you parallelized for me, and it works fine and I can see that both CPUs on my computer are utilized. Now I can start to work with my SAR processing and simulation code. Thankyou very much again.

Knut

0 Kudos
Reply