Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28914 Discussions

Why reshape gives me ‘insufficient virtual memory’ error?

CRquantum
New Contributor I
5,781 Views

I have an extremely simple Fortran code,

program main
integer, parameter :: i4=selected_int_kind(9)
integer, parameter :: i8=selected_int_kind(15)
integer, parameter :: r8=selected_real_kind(15,9)
integer(kind=i8) :: np
np = 2
call test03(np)
stop
end
subroutine test03(n)
integer, parameter :: i4=selected_int_kind(9)
integer, parameter :: i8=selected_int_kind(15)
integer, parameter :: r8=selected_real_kind(15,9)
integer ( kind = i8 ), intent(in) :: n
real ( kind = r8 ) :: warray(n,4),normal(n,4)
warray = 1.0
write(6,*) 'reshape', reshape(warray,shape(warray))
return
end subroutine test03

I use Intel OneAPI + visual studio 2015/2017.
Problem is, if I compile with intel Fortran in 'release mode' with options /O3 /QxHost /traceback

When I run it, it just give me ''insufficient virtual memory' error below,

CRquantum_1-1631767789305.png

 in 'debug' mode it also have this error.

 

Does anyone know why?

 


However, in 'release' mode, if I enable the below options setting default integer and real as kind 8, it works fine again.

CRquantum_0-1631767776370.png

 

Thank you very much in advance!

0 Kudos
1 Solution
Steve_Lionel
Honored Contributor III
5,496 Views

Thanks. I CAN reproduce this, but only when building for x64. Your project has a bunch of other options set, such as /Qmkl:cluster and /QxHost, but they don't matter. Only /heap-arrays matters, and I can reproduce it in a debug configuration as well (so it isn't optimization related). 

I don't know the argument list for for_allocate (the run-time library routine that does dynamic allocation), but it looks to me as a second for_allocate in that WRITE has a size of 262144 (piddling) and gets the error. The first one in that statement has the same size, no error.

Some more observations. If I move the reshape out of the WRITE and execute it on its own statement, assigning to a second array, I still get the error. But if I replace "shape(warray)" with [2,4], which should be the same, it works. So there is something rotten going on with shape(warray).

I must be misinterpreting the argument to for_allocate, as the array size should be 64, not 262144....

A workaround is to have a local two-element integer array into which you assign shape(warray) and then use that as the argument to reshape. For example:

    subroutine test03(n)
    implicit none
    integer, parameter :: i4=selected_int_kind(9)
    integer, parameter :: i8=selected_int_kind(15)
    integer, parameter :: r8=selected_real_kind(15,9)
    integer ( kind = i8 ), intent(in) :: n
    real ( kind = r8 ) :: warray(n,4)
    integer :: wshape(2)
    warray = 1.0
    wshape = shape(warray)
    write(6,*) 'reshape03', reshape(warray,wshape)
    return
    end subroutine test03

In any case, this looks like a compiler bug. If you have support, please open a ticket at the Intel Online Service Center. Otherwise let's hope an Intel support person picks it up from here.

View solution in original post

26 Replies
Arjen_Markus
Honored Contributor I
4,882 Views

I have tried to reproduce the problem, but on my system (using the command-line) the program simply does the intended job. I have tried with different options, just like in your message, but none produces the out-of-memory error.

Could you tell us which version of Intel Fortran oneAPI you are using and given the screenshot of Visual Studio, also the full command-line with all the options for the case that presents these problems?

0 Kudos
CRquantum
New Contributor I
4,870 Views

Sure thank you! 

 

In fact in this simple code, both debug mode and release have the same problem.

 

It is Intel OneAPI 2021.3, 

CRquantum_0-1631775014775.png

 

visual studio 2017 below

CRquantum_0-1631774024938.png

 

options for debug mode is below, in fact it is the default debug mode settings, I only added heap array below,

CRquantum_7-1631774790499.png

 

options for debug mode is below,

CRquantum_1-1631774095783.pngCRquantum_2-1631774131270.png

 

For release mode, options is below, but it seems does not matter, 'release mode' and 'debug mode' all have such error.

CRquantum_4-1631774218495.pngCRquantum_5-1631774235410.png

 

Laptop is Lenovo Thinkpad P72, xeon 2186 + 64GB ECC RAM + Nvidia quadro P5200

Windows 10 version is 1909

CRquantum_6-1631774291249.png

 

 

However, if I declare the array 'warray' as allocatable then it works. Just do not kown why. 

See the below code, 

use test04 subroutine which uses allocatable array, it works. 

use test03 it just have error as I said. 

 

program main  
implicit none
integerparameter :: i4=selected_int_kind(9)
integerparameter :: i8=selected_int_kind(15)
integerparameter :: r8=selected_real_kind(15,9)
integer(kind=i8) :: np 
np = 2
call test04(np)
!call test03(np)
stop
end
subroutine test03(n)
implicit none
integerparameter :: i4=selected_int_kind(9)
integerparameter :: i8=selected_int_kind(15)
integerparameter :: r8=selected_real_kind(15,9)
integer ( kind = i8 ), intent(in) :: n
real ( kind = r8 ) :: warray(n,4)
warray = 1.0
write(6,*) 'reshape03'reshape(warray,shape(warray))
return
end subroutine test03
subroutine test04(n)
implicit none
integerparameter :: i4=selected_int_kind(9)
integerparameter :: i8=selected_int_kind(15)
integerparameter :: r8=selected_real_kind(15,9)
integer ( kind = i8 ), intent(in) :: n
real ( kind = r8 ), allocatable :: warray(:,:)
allocate(warray(n,4))
warray = 1.0
write(6,*) 'reshape04'reshape(warray,shape(warray))
deallocate(warray)
return
end subroutine test04

 

In fact, using Intel parallel XE studio 2018 cluster edition update 4 with visual studio 2015 give me the same error and issue.

 

 

0 Kudos
Igor_V_Intel
Employee
4,789 Views

I don't see any issue with this code on my machine compiled with same compiler (ifort 2021.3) and /O3 /QxHost /traceback options. Both VS (debug and release configuration) runs fine with reshape03 and reshape04 subroutines:

>ifort /O3 /QxHost /traceback Console7.f90

...

>Console7.exe

 reshape03  1.00000000000000    1.00000000000000

  1.00000000000000    1.00000000000000    1.00000000000000

  1.00000000000000    1.00000000000000    1.00000000000000

My guess is something is wrong with your machine. It is weird that with allocatable array you don't observe the crash. In that case you use even more memory in heap than on stack.


0 Kudos
CRquantum
New Contributor I
4,773 Views

Thank you very much! 

 

The problem on my laptop is related to 'heap array' option, please see below, 

 

CRquantum_0-1631823490259.png

 

 

If I set heap array as 0, then it have error.  

 

Could you add the following compiling flag and try again? 

 

/heap-arrays0

 

 

 

If I leave this heap array option as default, means no value is set here, then work fine. 

However I always need to heap array because my arrays are usually big. Without heap array the program cannot run properly. 

Perhaps previous I always use allocatable array so I did not find the issue before. 

 

 

 

CRquantum_1-1631822966974.png

 

 

0 Kudos
CRquantum
New Contributor I
4,594 Views

Thank you very much Igor. 

Could you please see Dr. Fortran's comments, and perhaps report this possible compiler bug to Intel and so that it can be fixed? 

0 Kudos
Steve_Lionel
Honored Contributor III
4,763 Views

I can't reproduce the problem either, using the heap-arrays option. For an array this small, there's no reason reshape should fail.

CRquantum
New Contributor I
4,756 Views

Thank you Steve - the Dr. Fortran - Lionel! 

 

Uhm, do you have some suggestions, like reinstall the intel OneAPI or Visual Studio or something? 

 

Or, is it possible that it is related with the version of windows software developement kit? 

 

CRquantum_0-1631824737892.png

 

It is a strange on my laptop.  

 

If I leave heap array as is, 

 

CRquantum_1-1631824882399.png

 

It works fine. 

 

CRquantum_2-1631824922276.png

 

Set heap array to 0 it give me error, 

 

CRquantum_3-1631825013245.png

 

 

 

 

0 Kudos
Steve_Lionel
Honored Contributor III
4,716 Views

Please zip the whole project (do a Build > Clean first) and attach the ZIP to a reply here. I see you have /O3 enabled, but that doesn't help me reproduce the problem. Maybe there's something else at work here.

CRquantum
New Contributor I
4,698 Views

Thank you very much Dr. Fortran! 

Here I attached it. Double click smalltest.sln will open Visual studio and load the project. 

Please advise. Thank you very much indeed!

 

0 Kudos
Steve_Lionel
Honored Contributor III
5,497 Views

Thanks. I CAN reproduce this, but only when building for x64. Your project has a bunch of other options set, such as /Qmkl:cluster and /QxHost, but they don't matter. Only /heap-arrays matters, and I can reproduce it in a debug configuration as well (so it isn't optimization related). 

I don't know the argument list for for_allocate (the run-time library routine that does dynamic allocation), but it looks to me as a second for_allocate in that WRITE has a size of 262144 (piddling) and gets the error. The first one in that statement has the same size, no error.

Some more observations. If I move the reshape out of the WRITE and execute it on its own statement, assigning to a second array, I still get the error. But if I replace "shape(warray)" with [2,4], which should be the same, it works. So there is something rotten going on with shape(warray).

I must be misinterpreting the argument to for_allocate, as the array size should be 64, not 262144....

A workaround is to have a local two-element integer array into which you assign shape(warray) and then use that as the argument to reshape. For example:

    subroutine test03(n)
    implicit none
    integer, parameter :: i4=selected_int_kind(9)
    integer, parameter :: i8=selected_int_kind(15)
    integer, parameter :: r8=selected_real_kind(15,9)
    integer ( kind = i8 ), intent(in) :: n
    real ( kind = r8 ) :: warray(n,4)
    integer :: wshape(2)
    warray = 1.0
    wshape = shape(warray)
    write(6,*) 'reshape03', reshape(warray,wshape)
    return
    end subroutine test03

In any case, this looks like a compiler bug. If you have support, please open a ticket at the Intel Online Service Center. Otherwise let's hope an Intel support person picks it up from here.

FortranFan
Honored Contributor III
4,638 Views

Steve, are you able to reproduce it outside of Visual Studio IDE?

0 Kudos
Steve_Lionel
Honored Contributor III
4,624 Views

Yes, I can reproduce it from the command line.

D:\Projects\smalltest\smalltest>type test2.f90
    program main
    implicit none
    integer, parameter :: i4=selected_int_kind(9)
    integer, parameter :: i8=selected_int_kind(15)
    integer, parameter :: r8=selected_real_kind(15,9)
    integer(kind=i8) :: np
    np = 2
    !call test04(np)
    call test03(np)
    stop
    end
    subroutine test03(n)
    implicit none
    integer, parameter :: i4=selected_int_kind(9)
    integer, parameter :: i8=selected_int_kind(15)
    integer, parameter :: r8=selected_real_kind(15,9)
    integer ( kind = i8 ), intent(in) :: n
    real ( kind = r8 ) :: warray(n,4)
    warray = 1.0
    write(6,*) 'reshape03', reshape(warray,shape(warray))
    return
    end subroutine test03
    subroutine test04(n)
    implicit none
    integer, parameter :: i4=selected_int_kind(9)
    integer, parameter :: i8=selected_int_kind(15)
    integer, parameter :: r8=selected_real_kind(15,9)
    integer ( kind = i8 ), intent(in) :: n
    real ( kind = r8 ), allocatable :: warray(:,:)
    allocate(warray(n,4))
    warray = 1.0
    write(6,*) 'reshape04', reshape(warray,shape(warray))
    deallocate(warray)
    return
    end subroutine test04


D:\Projects\smalltest\smalltest>ifort /logo /heap-arrays test2.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.3.0 Build 20210609_000000
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.28.29915.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test2.exe
-subsystem:console
test2.obj

D:\Projects\smalltest\smalltest>test2.exe
forrtl: severe (41): insufficient virtual memory
Image              PC                Routine            Line        Source
test2.exe          00007FF740322ACF  Unknown               Unknown  Unknown
test2.exe          00007FF740321122  Unknown               Unknown  Unknown
test2.exe          00007FF740373ABE  Unknown               Unknown  Unknown
test2.exe          00007FF740373E40  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFEE4067034  Unknown               Unknown  Unknown
ntdll.dll          00007FFEE4A82651  Unknown               Unknown  Unknown

D:\Projects\smalltest\smalltest>
CRquantum
New Contributor I
4,587 Views

Thank you very much Dr. Fortran! I always learned a lot from your posts, replies, and thread all these years. 

I guess I do not have the support since I did not buy the support service. 

However, I wish Intel could fix it soon. 

 

Thank you for your explanation about 

" for_allocate (the run-time library routine that does dynamic allocation) "

I did from VTunes sometimes see this for_allocate guy took majority time of my code but just do not know which operation will trigger this for_allocate guy. Now I am a little more clear with your explanation. 

 

Thank you for your workaround example. 

Yes, initally, I actually just wanted to do things like 

    warray = reshape( some array, shape(some thing)  )

and however I found some 'access violation' error, then I try to write out the results, and I checked and found it is from the reshape,

then I made this small example, and found that it related with heap-arrays flag. 

 

Dr. Fortran,

 

1. Do you suggest 'heap-arrays' option? What might be the situation when 'heap-arrays' may significantly decrease the performance? 

 

2. one more thing, 

 

do you have any suggestions or some links, on how to write high performance modern Fortran code, and how to use Intel VTune/Advisor to improve performance?

(Especially Intel Fortran, because the first true programing language I use is Fortran, since 2008. Initally it is called Compaq Visual Fortran 6.6, then it becomes Intel Visual fortran, then it becomes intel parallel studio, then become intel parallel studio xe cluster edition, then it is now the Intel OneAPI. My whole life until now is basically exclusively using Intel Fortran)

 

I can basically only write as fast as possible code from algorithm level,

and by preventing calculating some thing more than twice,

and put as less operations as possible in the loop, etc. 

 

But from a more fundamental Fortran level, sometimes still not very clear about how vectorization works,

and sometimes still not very clear, like why, occasionally 'heap-arrays' can suddenly make my code 3 times slower, etc. 

Overall, if people ask me, has your Fortran code reached its fastest speed? I still just do not have the ability to say, yes, now my code really reach its fastest performance. I can only say, well, perhaps there is still room to improve. 

 

Eh, always dreaming of being a little closer to your Fortran level

I am not expecting reaching 100% of your Fortran level, but even a 10% would be already very powerful

 

Highest regards, and thank you for all your services and contributions at Intel and to the whole Fortran community! 

 

0 Kudos
Steve_Lionel
Honored Contributor III
4,600 Views

I was able to submit a ticket at the Intel OSC as issue 05214074.

Steve_Lionel
Honored Contributor III
4,540 Views

for_allocate is the routine compiled code calls to allocate dynamic memory. It can be from an ALLOCATE statement or can be for a temporary the compiler wants to generate when /heap-arrays is specified. Stack allocation is almost free in terms of CPU cycles, but it draws from a limited pool of address space and becomes impractical when large arrays need to be allocated. Whether you'll see a performance hit from calls to for_allocate depends on how many times it is called. If your code is frequently creating temporaries, which also means copying the data (also expensive), that can be noticeable. Ideally you should try to avoid temporary creation as much as possible.

I recognize that the example program you provided was just to show the error, and that's great, but its use of RESHAPE made no sense to me since you weren't actually reshaping the array. How is this used in your actual program? RESHAPE will always generate a temporary for its result - sometimes this is unavoidable but other times you can accomplish things another way (pointer bounds remapping is one).

CRquantum
New Contributor I
4,495 Views

Thank you very much Dr. Fortran! 

I learned a lot from your words.

 

In the code, the reshape is used to reshape a big 1D random number array into a say, 4D  array, then I use this 4D array (because it is more convenient) in the loop. By doing this, I prevent repeatedly call the random number generator in the loop, and it can save some time. The code is like below, 

...
real
 (kind = r8) :: normal(nd,4,n,np)
...
normald = nd*4*n*np
normal = reshape(gaussian(normald),shape(normal))
...
do j = 1, np    
call rk4_ti_vec(normal(:,:,:,j))
enddo

In the above code, gaussian(n) is a function to generate a 1*n gaussian random number array.

 

The actually problem is actually here, 

https://fortran-lang.discourse.group/t/why-a-function-returns-an-array-is-much-slower-than-a-subroutine-returns-an-array/1893

I found that in my code when I use a function which returns array is much slower than using a subroutine. 

Dr. Fortran, when you have leisure time would you have a quick look, thank you very much indeed!

0 Kudos
mecej4
Honored Contributor III
4,378 Views

CRquantum, you have shown several code segments in this thread, but this line in your post dated 9-19-2021 caught my eye:

 

normal = reshape(gaussian(normald),shape(normal))

 

 

If that is actually the line of code in your complete program, that would, it appears to me, be the cause of access violations, because it appears to be an attempt to map and reshape the one dimensional array section 

     gaussian( normal1d : (2*normal1d-1) )

to the three dimensional array

     normal(1:nd, 1:4, 1:n, 1:np)

In other words, only the first element of the first argument array to RESHAPE is valid, and all the rest of the elements access memory past the end of the array. The syntax is not correct, either, since the SOURCE argument to RESHAPE cannot be a scalar such as the array element gaussian(normal1d).

Do you see my point of doubt? Perhaps, what you want to do can be expressed as

normal = reshape(gaussian,shape(normal))

or

normal = reshape(gaussian(1:normald),shape(normal))
0 Kudos
CRquantum
New Contributor I
4,333 Views

Thank you very much. 

In my code, gaussian is a function (not the name of an array). 

what gaussian function does is, gaussian(n) returns an n-element 1d array whose index is from 1 to n.

In other words, gaussian(n) does not mean the nth element of array gaussian.

Therefore, gaussian(normal1d) generateis a 1d array, whose label/index is from 1 to normal1d (which is from 1 to nd*4*n*np). 

So, I am actually reshape an 1d array with normal1d elements to another 4d array normal with the same normal1d elements in total. I think what I did is exactly what you suggested. The number of elements match. So, this part does not cause the access violation. 

 

However, I got your point. You are saying that, if gaussian is really an array, by doing 

 

    normal = reshape(gaussian(normald),shape(normal))

 

I am probably reshape gaussian(normald:  'the end of the element' )into the array normal, which can cause memory violation. 

0 Kudos
mecej4
Honored Contributor III
4,270 Views

Ah, now I see the code from your point of view. You did state that gaussian is a function, but in text situated outside the code box, and I had only looked at the code.

0 Kudos
Steve_Lionel
Honored Contributor III
4,468 Views

Here's an example of doing what you're looking for using bounds remapping. No temporaries or copying required.

integer, pointer :: rank1_array (:)
integer, pointer :: rank4_array(:,:,:,:)
allocate (rank1_array(100))
rank1_array = [(i,i=1,100)]
print *, shape(rank1_array)
rank4_array(1:2,1:5,1:5,1:2) => rank1_array
print *, shape(rank4_array)
print *, rank4_array(2,1,5,1)
end
Reply