Allocate arrays

Zhanghong_T_ · ‎01-19-2009

Dear all,

Do you have such experience: In Windows XP 32 bit system (2Gb memory) we create a DLL project for Fortran code (by VS2008 + IVF 11.061) and the DLL file will be called by a C++ main program. Before entering into the entry of the Fortran subroutine, the C++ main program have occupied for about 1.1Gb memory. However, in Fortran subroutine, we will allocate verylarge arrays again (more than10 million, double and integer), at this time an error code "41" is return when trying to allocate these arrays. However, in debug mode, if we divide the large array into several small ones, the allocate can be done, i.e., replace the code

REAL*8,ALLOCATABLE::MAT(:)
INTEGER,ALLOCATABLE::IMAT(:)
INTEGER::NZ
NZ=10000000
ALLOCATE(MAT(NZ),IMAT(NZ),STAT=ERR)

by

REAL*8,ALLOCATABLE::MAT1(:),MAT2(:),MAT3(:),MAT4(:),MAT5(:)
INTEGER,ALLOCATABLE::IMAT1(:),IMAT2(:),IMAT3(:),IMAT4(:),IMAT5(:)
INTEGER::NZ
NZ=2000000
ALLOCATE(MAT1(NZ),IMAT1(NZ),STAT=ERR)
ALLOCATE(MAT2(NZ),IMAT2(NZ),STAT=ERR)
ALLOCATE(MAT3(NZ),IMAT3(NZ),STAT=ERR)
ALLOCATE(MAT4(NZ),IMAT4(NZ),STAT=ERR)
ALLOCATE(MAT5(NZ),IMAT5(NZ),STAT=ERR)

the code can run without error message. But if the code was compiled on release mode, it doesn't work. Could anyone please tell me what is the reason?

1) For my experience, the memory needed for allocating an double precision array with 10 million elements is about 80Mb, so why error code return when allocate the array directly?
2) Why different results by debug and release version? Can I avoid such difference?
3) Is there any help to set larger virtual memory?

Thanks,
Zhanghong Tang

jimdempseyatthecove · ‎01-20-2009

Tang,

In your C++ program, just after main, make a call to a Fortran program allocation routine to allocate the 80MB array(s) first. Note, if your C++ is making its allocations in the ctors of static objects you might need to create a "me first" static object whos ctor calls the Fortran subroutine to perform its allocation.

Jim Dempsey.

Zhanghong_T_ · ‎01-20-2009

Quoting - jimdempseyatthecove

Tang,

In your C++ program, just after main, make a call to a Fortran program allocation routine to allocate the 80MB array(s) first. Note, if your C++ is making its allocations in the ctors of static objects you might need to create a "me first" static object whos ctor calls the Fortran subroutine to perform its allocation.

Jim Dempsey.

Dear Jim,

Thank you very much for your kindly reply. But... the MAIN program doesn't know how large the size of the array is until the sparse matrix is generated.

Could you please tell me what limits are when allocate arrays by the subroutines inDLLs? I have tested that if I build the part of Fortrancode into a EXE file it can run successfully.Before running the EXE file, I launched the C++ main program and let it pause in somewhere and then the used memory is more than 1.5Gb.

Thanks,
Zhanghong Tang

Thanks,
Zhanghong Tang

jimdempseyatthecove · ‎01-20-2009

Tang,

If you intend to run only one instance of a C++ program using your DLL I would suggest you consider making the DLL a static library and link to that.

The problem you may have with DLL and allocatable memory is:
Does the allocatable memory belong in the DLL (available to all concurrent programs using the DLL)?
or
Does the allocatable memory belong with the application using the DLL.

When the array descriptor "lives" inside the DLL the data it points to should also lie within the DLL address space and this data is global for concurrent applications using the DLL (it is also restricted in size). From your description (building sparse matrix) I do not believe that this is what you intend.

Suggestion for work around.

Create a small Fortran routine for use with interfacing to your DLL, thissmall Fortran routine will be linked into your C++ application. The small Fortran routinecontaines a a module containing the unallocated array descriptors.

To allocate the 80MB arrays call the small Fortran routine from the C++ code once you have determined the array sizes. Then in places where you currently call the DLL from C++ change it to call a hook routine inside the small Fortran routine. This hook routine performs the call to the DLL and passes the array(s) plus other arguments into the DLL for processing.

Jim Dempsey

Zhanghong_T_ · ‎01-20-2009

Quoting - jimdempseyatthecove

Tang,

If you intend to run only one instance of a C++ program using your DLL I would suggest you consider making the DLL a static library and link to that.

The problem you may have with DLL and allocatable memory is:
Does the allocatable memory belong in the DLL (available to all concurrent programs using the DLL)?
or
Does the allocatable memory belong with the application using the DLL.

When the array descriptor "lives" inside the DLL the data it points to should also lie within the DLL address space and this data is global for concurrent applications using the DLL (it is also restricted in size). From your description (building sparse matrix) I do not believe that this is what you intend.

Suggestion for work around.

Create a small Fortran routine for use with interfacing to your DLL, thissmall Fortran routine will be linked into your C++ application. The small Fortran routinecontaines a a module containing the unallocated array descriptors.

To allocate the 80MB arrays call the small Fortran routine from the C++ code once you have determined the array sizes. Then in places where you currently call the DLL from C++ change it to call a hook routine inside the small Fortran routine. This hook routine performs the call to the DLL and passes the array(s) plus other arguments into the DLL for processing.

Jim Dempsey

Dear Jim,

Thank you very much for your so quick reply.

1) Do you mean that the static library could have better performance? I have tried to build the Fortran files to static library and linked it to the main program and then run the program, the same error appears.

2) In my program different arrays have different "live range": some are inside one subroutine and some are global arrays defined in the modules.

3) I have also tried to allocate large size arraysonce the sizes are determined, the same error code returned.

4) Could you please give me more detailed information about the "hook" and also some example about it?

Thanks,
Zhanghong Tang

jimdempseyatthecove · ‎01-21-2009

RE: 1) Performance was not the point. The point was the placement of the array descriptor, the use of the array descriptor by the DLL and the placement of the memory allocated to the descriptor in the DLL. A DLL is designed to be used by several programs at the same time. The memory contained within the DLL, including allocatable data, is generally construed to be available to all concurrent programs using the DLL (unless you program a selection method). Also, a DLL will be loaded on first use (or can be preloaded) however, a DLL may persist beyond the life of the program that called the DLL first and may persist into a time where all programs using the DLL exit. When a program using the DLL exits and if the DLL contains pointers into the former user memory space you will have problems if the pointer is subsequently used. The DLL may crash or the DLL may corrupt memory of the next application using the DLL. This is not to say that you cannot use array descriptors in a DLL to point to memory allocated in the address space of the calling program, rather it is to say you must be particularly careful in doing so.

RE: 2) If you code using a static library and then still observe problems allocating a large array then you are using too much memory (or too fragmented at the time of the 80MB allocation). Going to a DLL will not solve this problem, it will only make it worse.

Try this as an easy experiment to see if your program is simply too big

Add a variable to hold the size of the allocation and initialize it to 0
At the front of the program test the variable, if 0 do nothing, if nnnn allocate the fortran large array to nnnn
At the point later in your code after deciding on the size required but before calling the allocation print out the size you need. If the value is 0 stop as you now have the size, if is non-zero avoid calling the allocation routing.
Now load up the value of the size into the variable used at the beginning of the program and run a second time.

If the allocations fail, then your program plus datamay be too big for running on 32-bit platform.

Also, check to see if you can reduce the stack size to reclaim some space.

Jim Dempsey

Zhanghong_T_ · ‎01-21-2009

Quoting - jimdempseyatthecove

RE: 1) Performance was not the point. The point was the placement of the array descriptor, the use of the array descriptor by the DLL and the placement of the memory allocated to the descriptor in the DLL. A DLL is designed to be used by several programs at the same time. The memory contained within the DLL, including allocatable data, is generally construed to be available to all concurrent programs using the DLL (unless you program a selection method). Also, a DLL will be loaded on first use (or can be preloaded) however, a DLL may persist beyond the life of the program that called the DLL first and may persist into a time where all programs using the DLL exit. When a program using the DLL exits and if the DLL contains pointers into the former user memory space you will have problems if the pointer is subsequently used. The DLL may crash or the DLL may corrupt memory of the next application using the DLL. This is not to say that you cannot use array descriptors in a DLL to point to memory allocated in the address space of the calling program, rather it is to say you must be particularly careful in doing so.

RE: 2) If you code using a static library and then still observe problems allocating a large array then you are using too much memory (or too fragmented at the time of the 80MB allocation). Going to a DLL will not solve this problem, it will only make it worse.

Try this as an easy experiment to see if your program is simply too big

Add a variable to hold the size of the allocation and initialize it to 0
At the front of the program test the variable, if 0 do nothing, if nnnn allocate the fortran large array to nnnn
At the point later in your code after deciding on the size required but before calling the allocation print out the size you need. If the value is 0 stop as you now have the size, if is non-zero avoid calling the allocation routing.
Now load up the value of the size into the variable used at the beginning of the program and run a second time.

If the allocations fail, then your program plus datamay be too big for running on 32-bit platform.

Also, check to see if you can reduce the stack size to reclaim some space.

Jim Dempsey

Dear Jim,

Thank you again for your kindly help. I have experimented your method before. Two different results got when:
1) Under the C++ main program, the size of allocatable array is very limited (less than 80Mb);
2) Run some other programs to let the spent memory become the same as or a little large than the C++ main program run, CREATE a new program and then try to allocate array, the size of allocatable array is more than ten times of 1).

Latter I found that in Win32 system, the maximal available for ONE process is 2Gb, but my question is:
1) in the first condition, even the array of 80Mb is allocated, the total memory spent by the C++ main program is less than 1.6Gb, why can't it work?
2) Can I create a new process in the DLL and then let the array allocation and calculation be done by the new process? How to do?

Thanks,
Zhanghong Tang

jimdempseyatthecove · ‎01-22-2009

In the first section, 2) is the size you request for allocation the same (e.g. 80MB) but the footprint of the application interpreted by Windows Task Manager 10x?

Also, check your Virtual Memory settings:

Start | Control Panel | System | Advanced | Performance:Settings | Advanced | Virtual memory:Change

Select Custom size with Initial size (MB) sufficient for your application (plus other applications)

In the second section, 1), your application has 400MB available (or so you think) but it may be fragmented due to prior use. Since your allocation in pieces seems to work when the allocation of 80MB in one piece fails this is a good indication of memory fragmentation. Note, each process has its own Virtual Address space which cannot be fragmented by a different process. However, the sum total of all processes virtual address space has to fit within the available page file size (set by Virtual Memory settings described above).

Second section 2), the code and data within a DLL is mapped into the virtual address space of each process using the DLL. You will not be able to have the additional 80MB reside in another process virtual address.

Is it safe to assume that your application code + data is as large as it will ever be?
If not, (it will grow larger), then considerrequiring use on x64 platform
If so, try
compiling for minimum size
compiling for Requires SSE3
disable unrolling loops (as project default, unroll only those required)
disable inlining (as project default, inline only those required)

The above will reduce your code size.

If that doesn't work
can you break your program in two?
part 1: compute your sparse matrix inand write to file (binary format) exit
part 2: read sparse matrix, allocate large array, process data

Jim Dempsey

Zhanghong_T_ · ‎01-23-2009

Quoting - jimdempseyatthecove

In the first section, 2) is the size you request for allocation the same (e.g. 80MB) but the footprint of the application interpreted by Windows Task Manager 10x?

Also, check your Virtual Memory settings:

Start | Control Panel | System | Advanced | Performance:Settings | Advanced | Virtual memory:Change

Select Custom size with Initial size (MB) sufficient for your application (plus other applications)

In the second section, 1), your application has 400MB available (or so you think) but it may be fragmented due to prior use. Since your allocation in pieces seems to work when the allocation of 80MB in one piece fails this is a good indication of memory fragmentation. Note, each process has its own Virtual Address space which cannot be fragmented by a different process. However, the sum total of all processes virtual address space has to fit within the available page file size (set by Virtual Memory settings described above).

Second section 2), the code and data within a DLL is mapped into the virtual address space of each process using the DLL. You will not be able to have the additional 80MB reside in another process virtual address.

Is it safe to assume that your application code + data is as large as it will ever be?
If not, (it will grow larger), then considerrequiring use on x64 platform
If so, try
compiling for minimum size
compiling for Requires SSE3
disable unrolling loops (as project default, unroll only those required)
disable inlining (as project default, inline only those required)

The above will reduce your code size.

If that doesn't work
can you break your program in two?
part 1: compute your sparse matrix inand write to file (binary format) exit
part 2: read sparse matrix, allocate large array, process data

Jim Dempsey

Hi Jim,

Thank you very much for your so kindly reply. Now the problem is solved by your last method. I build another executable file to do the work of "part 2". The code to call the executable file is as follows:

subroutine Shell(filename,command)
use ifwinty
use ifwin
implicit none
integer(BOOL)::res
integer(DWORD) ret
character*256::szFullPath
character(*)::filename,command
type(T_SHELLEXECUTEINFO)::ShExecInfo
integer::iPosOfLastBackslash
ret = GetModuleFileName (NULL, szFullPath,len(szFullPath))
iPosOfLastBackslash = Index( szFullPath , "" , BACK = .TRUE. )
ShExecInfo.cbSize = sizeof(ShExecInfo)
ShExecInfo.fMask = SEE_MASK_NOCLOSEPROCESS
ShExecInfo.hwnd = NULL
ShExecInfo.lpVerb = NULL
ShExecInfo.lpFile = loc(trim(szFullPath(1:iPosOfLastBackslash))//trim(filename))
ShExecInfo.lpParameters = loc(trim(command))
ShExecInfo.lpDirectory = NULL
ShExecInfo.nShow = SW_HIDE
ShExecInfo.hInstApp = NULL
res = ShellExecuteEx(ShExecInfo)
ret = WaitForSingleObject(ShExecInfo.hProcess,INFINITE)
end subroutine

It is strange that total memory used by the C++ main program (part 1) and the program of part 2 is still less than 2GB (from Windows Task Manager). It seems that more processes are better for large memory program.

Does anyone have some better suggestions?

Thanks,
Zhanghong Tang

jimdempseyatthecove · ‎01-23-2009

Tang,

Although the technique is not the most desireable at least. You can use this technique for now while you finish up any remaining details for your application. Such as add or remove features. Once that part of your project is complete you can come back to try to work out the memory size problem.

You might find placing trace printouts in your code at the allocation and deallocation statements (write to file if need be) may point to the problem area. One common source of problem is not returning allocated memory when you are done with it. Examples are letting program exit do the clean-up or performing the deallocation call levels further out than they need to be. On your trace routines include the Hex address of the item allocated or deallocated, size in bytes,and a sequence number. By formatting this right you can import it into an Excel spread sheet and make some sense out of the data.

Put these edits into the split-up programs since they run. The amount you allocate should equal the amout you deallocate, if they don't then you have a memory leak. This should be easy enough to find using a program on the output file. Remove lines where allocation address==deallocation address, what remains will be the leaked memory. The sequence number of the allocation can be used in your printout routine on the next run to trigger a break.

Also, if your program is an assemblege of old F77 files you may have a bunch of different COMMON buffers and arrays that are used in sequence as you pass through the program. By changing these to module allocatable arrays you can allocate them when needed and deallocate when you don't. Also, if they tend to be the same size but with different names you might be able to fix that code such that both routines use the same buffer (cut down on allocation/deallocation).

Another area for reclimation is for subroutine/function local temporary arrays (not SAVE)is to add the AUTOMATIC attribute or to use the /Qauto option (assuming /Qauto is safe for your applicaiton)

Jim Dempsey

Zhanghong_T_ · ‎01-23-2009

Quoting - jimdempseyatthecove

Tang,

Although the technique is not the most desireable at least. You can use this technique for now while you finish up any remaining details for your application. Such as add or remove features. Once that part of your project is complete you can come back to try to work out the memory size problem.

You might find placing trace printouts in your code at the allocation and deallocation statements (write to file if need be) may point to the problem area. One common source of problem is not returning allocated memory when you are done with it. Examples are letting program exit do the clean-up or performing the deallocation call levels further out than they need to be. On your trace routines include the Hex address of the item allocated or deallocated, size in bytes,and a sequence number. By formatting this right you can import it into an Excel spread sheet and make some sense out of the data.

Put these edits into the split-up programs since they run. The amount you allocate should equal the amout you deallocate, if they don't then you have a memory leak. This should be easy enough to find using a program on the output file. Remove lines where allocation address==deallocation address, what remains will be the leaked memory. The sequence number of the allocation can be used in your printout routine on the next run to trigger a break.

Also, if your program is an assemblege of old F77 files you may have a bunch of different COMMON buffers and arrays that are used in sequence as you pass through the program. By changing these to module allocatable arrays you can allocate them when needed and deallocate when you don't. Also, if they tend to be the same size but with different names you might be able to fix that code such that both routines use the same buffer (cut down on allocation/deallocation).

Another area for reclimation is for subroutine/function local temporary arrays (not SAVE)is to add the AUTOMATIC attribute or to use the /Qauto option (assuming /Qauto is safe for your applicaiton)

Jim Dempsey

Hi Jim,

Thank you very much for your kindly reply again. I am also worry about the memory leak problem. Your method to check memory leak problem is very smart. I use a similar method: find all "allocatable" arrays declared in the program and deallocate all of them at the end of the program by the code like:

if (allocated(A))deallocate(A)

In addition, before allocate the array, the same thing is done.

My program is written by F90 format and all shared arrays are declared in a module so there is no "COMMON" problem. However, I will test your last suggestion about "AUTOMATIC" or "/Qauto".

Thanks,
Zhanghong Tang

Zhanghong_T_ · ‎01-23-2009

Quoting - jimdempseyatthecove

Tang,

Although the technique is not the most desireable at least. You can use this technique for now while you finish up any remaining details for your application. Such as add or remove features. Once that part of your project is complete you can come back to try to work out the memory size problem.

You might find placing trace printouts in your code at the allocation and deallocation statements (write to file if need be) may point to the problem area. One common source of problem is not returning allocated memory when you are done with it. Examples are letting program exit do the clean-up or performing the deallocation call levels further out than they need to be. On your trace routines include the Hex address of the item allocated or deallocated, size in bytes,and a sequence number. By formatting this right you can import it into an Excel spread sheet and make some sense out of the data.

Put these edits into the split-up programs since they run. The amount you allocate should equal the amout you deallocate, if they don't then you have a memory leak. This should be easy enough to find using a program on the output file. Remove lines where allocation address==deallocation address, what remains will be the leaked memory. The sequence number of the allocation can be used in your printout routine on the next run to trigger a break.

Also, if your program is an assemblege of old F77 files you may have a bunch of different COMMON buffers and arrays that are used in sequence as you pass through the program. By changing these to module allocatable arrays you can allocate them when needed and deallocate when you don't. Also, if they tend to be the same size but with different names you might be able to fix that code such that both routines use the same buffer (cut down on allocation/deallocation).

Another area for reclimation is for subroutine/function local temporary arrays (not SAVE)is to add the AUTOMATIC attribute or to use the /Qauto option (assuming /Qauto is safe for your applicaiton)

Jim Dempsey

Hi Jim,

Thank you very much for your kindly reply again. I am also worry about the memory leak problem. Your method to check memory leak problem is very smart. I use a similar method: find all "allocatable" arrays declared in the program and deallocate all of them at the end of the program by the code like:

if (allocated(A))deallocate(A)

In addition, before allocate the array, the same thing is done.

My program is written by F90 format and all shared arrays are declared in a module so there is no "COMMON" problem. However, I will test your last suggestion about "AUTOMATIC" or "/Qauto".

Thanks,
Zhanghong Tang

jimdempseyatthecove · ‎01-23-2009

Tang,

The problem you should check for is

program YourProgram
call DoWork
if (allocated(A))deallocate(A)
if (allocated(B))deallocate(B)
if (allocated(C))deallocate(C)
...
end program YourProgram

These deallocations are too late, and may be the cause of lack of available memory.
You need to place those tests in deeper into the code where they belong (after the return from the routine that allocated the memory). These tests can be placed in conditional code that compiles in Debug configuration.

Good luck

Jim

Zhanghong_T_ · ‎01-23-2009

Quoting - jimdempseyatthecove

Tang,

The problem you should check for is

program YourProgram
call DoWork
if (allocated(A))deallocate(A)
if (allocated(B))deallocate(B)
if (allocated(C))deallocate(C)
...
end program YourProgram

These deallocations are too late, and may be the cause of lack of available memory.
You need to place those tests in deeper into the code where they belong (after the return from the routine that allocated the memory). These tests can be placed in conditional code that compiles in Debug configuration.

Good luck

Jim

Dear Jim,

Thank you very much for your so quick reply. I think you mean that I should deallocate these arrays once they are out of their "scope" or their values are useless anymore. I have tried to do so, even they are declared in modules. I will check it more carefully.

But as I said before, the strange problem is that the program can run sucessfully in debug mode (release version of C++ main program & debug version of Fortran DLL) but will crash in release mode (release version of C++ main program & release version of Fortran DLL). What on earth lead to such problem? I traced into the code by writting information into fileas your suggestions and found two strange things:
1) for the debug version: it can't allocate the large array, but can allocate several smaller arrays which have the same total elements;
2)the exact difference between debug and release version is that the debug version can allocate the smaller arrays but the release version can't.

Thanks for your help again,
Zhanghong Tang

jimdempseyatthecove · ‎01-24-2009

>> But as I said before, the strange problem is that the program can run sucessfully in debug mode (release version of C++ main program & debug version of Fortran DLL) but will crash in release mode (release version of C++ main program & release version of Fortran DLL). What on earth lead to such problem? I traced into the code by writting information into file as your suggestions and found two strange things:
1) for the debug version: it can't allocate the large array, but can allocate several smaller arrays which have the same total elements;
2) the exact difference between debug and release version is that the debug version can allocate the smaller arrays but the release version can't.
<<

The typical cause for programs running OK in Debug but failing in Release is the use of uninitialized variables. Fortran does not initialize variables to 0.

A non-typical cause is the problem in the code causes different behavior. The problems are still there, the symptoms are not apparent.

Jim