Re: forrtl: severe (41): insufficient virtual memory

sarge130 · ‎04-08-2023

I have some old fortran code that is starting to reach its limits. We would like to maintain its use, but until we are able to develop a new version, we need to continue to use it. We are currently only able to use this program when it is compiled in 32-bit mode and there is one particular program that is currently giving an "insufficient virtual memory" error. I've posted a minimal snippet from the program below. In it, I use the variable normal_sq to set up a matrix at four different sizes. Lines 11-16 test compiler32 since we use those to compile the software we use. I hit the maximum matrix size at 21674 x 21674 with the 32-bit compilations. Lines 17-22 use the 64-bit compiler and hit the maximum size at 92682. I'm trying to extend the 32-bit compilation to use a larger matrix size, is there a way around this error with ifort in 32-bit mode? The system itself has plenty of memory (64 GB).

$ module -s load compiler32 mkl32
$ ifort -traceback  allocate_test.f90 -qmkl -o allocate_test
$ ./allocate_test 
 Allocating matrix of size 21673 x 21673 (always works)
 Allocating matrix of size 21674 x 21674 (fails with: -m32 or compiler32 mkl32)
forrtl: severe (41): insufficient virtual memory
Image              PC        Routine            Line        Source             
allocate_test      0805198E  Unknown               Unknown  Unknown
allocate_test      08051B36  Unknown               Unknown  Unknown
allocate_test      0804AB60  MAIN__                     15  allocate_test.f90
allocate_test      0804A90A  Unknown               Unknown  Unknown

program allocate_test

implicit none
!Simple test to test allocation maximum

integer:: thread_id, nthreads
! Variables 
double precision, allocatable  :: normal_sq(:,:)
double precision                  constant

write(*,*) "Allocating matrix of size 21673 x 21673 (always works)"
allocate(normal_sq(21673,21673))
deallocate(normal_sq)
write(*,*) "Allocating matrix of size 21674 x 21674 (fails with: -m32 or compiler32 mkl32)"
allocate(normal_sq(21674,21674))
deallocate(normal_sq)
write(*,*) "Allocating matrix of size 92681 x 92681 (works with: compiler mkl)"
allocate(normal_sq(92681,92681))
deallocate(normal_sq)
write(*,*) "Allocating matrix of size 92682 x 92682 (fails with: compiler mkl)"
allocate(normal_sq(92682,92682))
deallocate(normal_sq)

stop
end program

jimdempseyatthecove · ‎04-08-2023

Two things to investigate:

1) Allocate your largest arrays first. And then do not deallocate them. Keep them and re-use them. If necessary, use a smaller subsection of the array for smaller use on subsequent iteratons. The arrays can have TARGET, then use a pointer to declare a smaller sub-section.

2) Windows has a 3GB feature. This might provide for a bit more virtual address space. Note, this applies to 32-bit Windows. There may be an analog to this on 64-bit Windows for 32-bit applications.

Edit: from https://techcommunity.microsoft.com/t5/ask-the-performance-team/memory-management-demystifying-3gb/ba-p/372333

OK - so let's quickly recap what we've discussed so far. The /3GB switch is not related to the amount of physical memory installed in a system. It is useful if you have an application that can take advantage of a larger address space. For a process to access the full 3GB address space, the image file must have the IMAGE_FILE_LARGE_ADDRESS_AWARE flag set in the image header.

If the flag is not set in the image header, then the OS reserves the third gigabyte so that the application won't see virtual addresses greater than 0x7FFFFFFF. You set this flag by specifying the linker flag /LARGEADDRESSAWARE when building the executable. This flag has no effect when running the application on a system with a 2-GB user address space. Therefore if you enable the /3GB switch, then applications that do not have this flag set can only use the standard 2GB of User mode memory, and the Kernel is still limited to the 1GB space - which means that 1GB of virtual memory is basically wasted!

However, this appears to apply to 32-bit O/S. But it might be worth exploring on a 32-bit application running on 64-bit O/S.

Jim Dempsey

sarge130 · ‎04-08-2023

Hi Jim, thanks for the quick response. I should have mentioned that this is being run on red hat linux machines using RHEL 9.1. Are you aware of any similar solutions for linux?

Regarding the matrix allocations: I see! When I swapped the matrix allocations to ensure the larger matrix was allocated first, the code was able to complete successfully. So it sounds like the operational code might be re-using a matrix that was previously allocated and requested a larger version later on. As you can probably tell, I'm not much of a fortran programmer. This might be tricky to implement, but I think I understand the problem now if that's what's happening. I found an example here: https://fortran-lang.org/en/learn/best_practices/multidim_arrays/

sarge130 · ‎04-08-2023

I was reading this page: https://stackoverflow.com/questions/19781713/what-is-the-biggest-array-size-for-double-precision-in-fortran-90

And used the code there for some testing. I found that I hit the virtual memory error when I don't use the -qmkl flag but hit an overflow error when I do use it. They seem to allocate and deallocate to larger and larger arrays when iterating:

program allocate_test

implicit none
!Simple test to test allocation maximum

! Variables 
double precision, allocatable :: a(:)
integer*4 i

do i=1,100
   allocate(a(2**i))
   a(size(a)) = 1
   deallocate(a)
   write(*,*) i
 end do
end

$ ifort -m32 -traceback  allocate_test.f90 -qmkl -check -o allocate_test
$ ./allocate_test 
           1
...
          28
forrtl: severe (179): Cannot allocate array - overflow on array size calculation.
Image              PC        Routine            Line        Source             
allocate_test      08060409  Unknown               Unknown  Unknown
allocate_test      080608E6  Unknown               Unknown  Unknown
allocate_test      0804AADC  MAIN__                     11  allocate_test.f90
allocate_test      0804A90A  Unknown               Unknown  Unknown

$ ifort -m32 -traceback  allocate_test.f90 -check -o allocate_test
$ ./allocate_test 
           1
...
          27
forrtl: severe (41): insufficient virtual memory
Image              PC        Routine            Line        Source             
allocate_test      0806069E  Unknown               Unknown  Unknown
allocate_test      08060846  Unknown               Unknown  Unknown
allocate_test      0804AA3C  MAIN__                     11  allocate_test.f90
allocate_test      0804A86A  Unknown               Unknown  Unknown

jimdempseyatthecove · ‎04-09-2023

>>I'm not much of a Fortran programmer.

Become language agnostic. Use whatever works best.

>>When I swapped the matrix allocations to ensure the larger matrix was allocated first, the code was able to complete successfully.

When memory is tight, you must pay close attention to allocations such as to not fragment the heap such that the (later) larger allocations fail to locate a node of sufficient size. This applies to all languages.

>> Multidimensional Arrays link

These are fine to use.... but as with a single dimension array, under tight memory constrictions you must be careful to not fragment the heap.

Of particular interest for you from this link is the last example where they use a pointer (of different rank in this case) to point to an array. In the example, they pointed to the entire array. In your case, consider pointing to a (contiguous) subsection of an array. IOW you can make your initial allocation to the largest expected requirements for that named array, but under a different name, then use a pointer to point to the slice of the size you want. This in effect becomes a single node heap.

module blobs
    double precision, allocatable, target  :: normal_sq_blob(:)
    ! ... other blobs here
    contains
subroutine init_blobs
    integer size_normal_sq
        
    size_normal_sq = 1000*1000 ! call somewher_to_get(size_normal_sq)
    allocate(normal_sq_blob(size_normal_sq))
    ! ... other get & allocate blobs here
end subroutine init_blobs
end module blobs
    
program Console16
    use blobs
    implicit none
    call init_blobs()
    call doWork()
end program Console16

subroutine doWork
    use blobs
    implicit none
    double precision, contiguous, pointer  :: normal_sq(:,:)
    integer :: dim1Size, dim2Size
    
    dim1Size = 123; dim2Size = 456  ! The sizes you need
    ! replace allocate(normal_sq(dim1Size, dim2Size)) with
    if(dim1Size * dim2Size > size(normal_sq_blob)) STOP "allocation error"
    normal_sq(1:dim1Size, 1:dim2Size) => normal_sq_blob
    ! ...
end subroutine doWork

Jim Dempsey

jimdempseyatthecove · ‎04-09-2023

The problem with your code is that you assume whatever heap manager is linked into your program, that it performs consolidations of adjacent free nodes. This is not always the case.

IOW your code presumes that (my presumption) that:

a) the heap has a single free node of maximum remaining size of memory.

b) first allocation will extract a node for current allocation, leaving a free node of the remainder

c) corresponding deallocation returns the memory resulting in a single free node of maximum remaining size of memory (original max)

Case c) is not necessarily the case. Often, the deallocaton results in two nodes in the heap. You are not assured that the heap manager will consolidate the nodes. Most systems provide ways to handle deallocation. In Windows, its called Low-Fragmentation Heap. I haven't looked at the Linux manual in a while, that is something you can do, to determine the default behavior and how you might override it with the behavior you seek. Try this and see what you get.

program allocate_test

implicit none
!Simple test to test allocation maximum

! Variables 
double precision, allocatable :: a(:)
integer*4 i, iStat

do i=100,1, -1
   allocate(a(2**i), STAT=iStat)
   if(iStat == 0) then
     print *,"first largest allocation size = ", size(a), i
     a(size(a)) = 1
     deallocate(a)
     exit
   endif
 end do
end program allocate_test

Jim Dempsey

Ron_Green · ‎04-11-2023

If this is running on the same server, also make sure there is enough space left on the disk for swap:

swapon -s

and as a sanity check, make sure root disk is not out of space

df -k

the swap partition should be 8GB or thereabouts in a normal server.

Question: in this code, are the arrays allocatable OR are they in COMMON? Or statically declared at a fixed size in the main program?