large memory usage with large file

Matt_C_2 · ‎08-22-2016

Hello. I have been debugging a curious issue with an application consuming a large amount of memory when writing or reading a large file.

The program below will write out about 1.5GB of data,

The TaskManager will show the memory usage slowly creeping up, but the process associated to the program does not appear to use the memory directly,

Instead, the memory is consumed by a memory-mapped file tied to the data file. (I used Microsoft's RamMap to find it.)

program Console2

    implicit none
    INTEGER I
    OPEN(UNIT=7,STATUS='NEW',ACCESS='DIRECT',RECL=4, FORM='UNFORMATTED',FILE='MYDAT')
    DO 20 I=1,100000000
        WRITE(7,REC=I) 'DATA'
20 CONTINUE
    CLOSE(7)
    end program Console2

In the real program, the files are much larger and thus consume even more memory. This causes the program to crash with insufficient memory most but not all of the time.

It also affects both Win32 and x64, though I am predominantly using the latter.

Thanks for any suggestions or help you can provide.

-Matt

Roman1 · ‎08-22-2016

I think the problem is related to I/O Buffering ( /assume:buffered_io ) . Can you try enabling it and see what happens.

Matt_C_2 · ‎08-22-2016

That seems to improve things. With that option, the test program used only about 500MB of memory, rather than 1.5GB.

I then doubled the file size to 3GB and the /assume:buffered_io option kept the memory size to the same limit of around 500MB.

(Previously doubling the file size doubled the amount of memory consumed.)

Also, without the option, all of the file details listed in RamMap are marked as 'Active'.

With the option set, most of the file details are listed as 'Standby,' with a few marked as Modified, so the 500GB is probably just used because it is available.

I will try it with the original program and data file sizes.

Thanks!

-Matt

jimdempseyatthecove · ‎08-23-2016

You might also wan to experiment using STREAM writes. DIRECT may infer you may also intent to (immediately) re-read the file.

On an x32 environment 500MB may be too much to deal with.

Jim Dempsey

Roman1 · ‎08-23-2016

Matt, I am glad that it improved things for you. However, I have read the Fortran documentation more carefully, and I don't understand why adding /assume:buffered_io made a difference. In your code, you open the file with ACCESS='DIRECT' . From the documentation:

https://software.intel.com/en-us/node/579519

If a file is opened for direct access, I/O buffering is ignored.

Matt_C_2 · ‎08-23-2016

I was previously unaware of the /assume:buffered_io option, but I had tried various combinations of BUFFERED, BLOCKSIZE,and BUFFERCOUNT. They did not seem to affect the real program.

In the real program, as with the test program, without the assume option, all of the details in the memory-mapped file are marked as 'ACTIVE'. With the option, some are marked ACTIVE, some are marked MODIFIED, and some are marked as STANDBY as expected.

Also, with the real program and the assume option, it will use whatever memory is available all this machine (8GB) in x64, but I am able run many, many additional programs during the run without crashing, again as expected.

I will also take a look at the 'STREAM' option as I was unaware of it. The data files are however re-read, several times, and sometimes immediately.

jimdempseyatthecove · ‎08-24-2016

>>The data files are however re-read, several times, and sometimes immediately

If they are re-read sequentially, then you do not need DIRECT...

... but if they are re-read randomly, then you need DIRECT. In which case memory mapped files may be useful provided that the file can be partitioned/partially mapped based on available resources .OR. specific tuning parameters.

Jim Dempsey

IanH · ‎08-25-2016

Stream access lets you do random access too. With unformatted stream you can reposition to any arbitrary file storage unit (byte) in the file. With formatted stream you can reposition to any position that you previously remembered.

Steven_L_Intel1 · ‎08-25-2016

It looks to me as if the memory use is by Windows and not under the control of the Fortran I/O system.

Matt_C_2 · ‎08-25-2016

Yes, the memory appears to be used by Windows. But the example program doesn't memory-map the data file -- it just writes a lot of data to an old standard, Fortran data file.

Yet, depending on the number bytes written, the program/data file can consume all available memory and won't give any back until the program terminates!

The /assume:buffered_io makes huge difference for my both my test and real program, so I don't think it's just a Windows issue.

BTW, I am using Windows 7 and Intel Composer 2013.