Fortran execute_command_line runtime error, depends on memory consumption

Alexander_S_2 · ‎03-12-2019

I am getting runtime errors when trying to create a directory using the execute_command_line intrinsic in Fortran. The error occurs both with Ifort (18.0.3 20180410) and gfortran (4.8.5).
Here is a minimal example that fails with whatever compile flags I use:

    PROGRAM directory_test
        
        IMPLICIT NONE
        
        INTEGER :: cstat, estat, i, j
        CHARACTER(LEN=100) :: cmsg
        
        REAL, DIMENSION(:,:), ALLOCATABLE :: field
        INTEGER, PARAMETER :: fieldsize = 100000
        
        allocate(field(fieldsize,fieldsize))
        do j=1, fieldsize
            do i=1, fieldsize
                field(i,j) = real(i+j)
            end do
        end do
    
        call execute_command_line('mkdir -p newdir', WAIT=.true., EXITSTAT=estat, CMDSTAT=cstat, CMDMSG=cmsg)
        
        write(*,*) 'estat: ', estat
        write(*,*) 'cstat: ', cstat
        write(*,*) 'cmsg:  ', cmsg
        

        deallocate(field)

        
    END PROGRAM directory_test

Output ifort:

estat:      4196936
cstat:          124
cmsg:
Invalid command supplied to EXECUTE_COMMAND_LINE

Output gfortran:

estat: -1565892912
cstat: 1
cmsg: Termination status of the command-language interpreter cannot be obtained

Here is the catch: the program runs just fine as long as the array size is small enough. For me the threshold is about half of the physical memory used (adjust the value of "fieldsize" if you want to try the code). If the array is larger than that, the error occurs. If the array is smaller, the code executes without errors and the directory is created.
The machines I used to test this all have 2 physical CPUs and 48GB-256GB of RAM. For some the threshold to fail is higher, but they all fail at some point.
What am I doing wrong?

OS: Linux, Opensuse 42.3 (and older versions)
shell: bash
file system: Ext4

Additional observation: using "call system()" instead of "execute_command_line()" has a similar behavior. The new directory is not created in cases where the original method would have failed. But it does not result in a runtime error.

Juergen_R_R · ‎03-12-2019

I'd say you are exceeding memory before the call to execute_command_line can be reached. The nagfor compiler explicitly says so:

cmsg: EXECUTE_COMMAND_LINE('mkdir -p newdir'): Cannot allocate memory

Alexander_S_2 · ‎03-12-2019

Interesting, thanks for trying. But for me this raises 2 questions:

1) why does the command fail even if there is still plenty of physical memory left on the system?

2) how can I circumvent it? The system still has free physical memory when the error occurs.

Steve_Lionel · ‎03-12-2019

Physical memory is not the issue. It has to do with virtual memory limits configured in the kernel and may be constrained by the size of your swapfile.

Alexander_S_2 · ‎03-13-2019

So back at the machine: there was indeed a process that occupied all virtual memory on the system, around 250g. Which seems weird to me because the system only has 128GB of RAM and 12GB of swap.

I guess I don't know enough in this area, because my initial thought would be: how can my Fortran program even allocate the field if the whole virtual memory is already eaten up by another process.

Anyway, I disabled this process - baloo file indexing. Now the virtual memory is 99% free. What happens now is: the threshold for the program is very near half of the virtual memory occupied. In numbers:

A fieldsize of 134000 uses 66.9g of virtual memory. Execute_command_line is successful

A fieldsize of 135000 uses 67.9g of virtual memory. Execute_command_line fails

Does that mean that while forking the process in order to execute_command_line, the system temporarily needs twice the amount virtual memory that the original process required?

jimdempseyatthecove · ‎03-13-2019

>>if the whole virtual memory is already eaten up by another process.

Not possible. Each process has its own virtual memory. Consider this a working address space (virtual addresses). These addresses need not be resident in physical addresses (IOW paged out), and the virtual addresses of a process are generally not located at the same physical address (in some cases they can be). The total sum of all processes virtual addresses (used) must fit within the page file, and the page file size can, and generally does, exceed physical RAM size (though you can set the page file size small enough such that the sum of all processes must fit within physical RAM -- this is your choice).

>>Does that mean that while forking the process in order to execute_command_line, the system temporarily needs twice the amount virtual memory that the original process required?

On Linux... this appears to be the case. What you might consider doing is at the start of the program (before allocations):

1) Delete any prior run intercommunication files
2) Issue EXECUTE_COMMAND_LINE("your post processing program", .false., YourExitStat, YourCMDstat, YourCMDmsg)
3) followed by: IF(YourCMDstat == -2) STOP("Asynchronous mode not supported)
4) build your output file(s)
5) Signal "ready" via agreed upon file name created for your post processing program
6) Optionally wait for post processing program, or continue creating additional output.

The post processing program awaits for intercommunication file creation with instructions of what to do

Note, the above can be done with a shell script. Many shell script programs permit a launch of a process asynchronously:

    unlink ./MessagingFile
    ./YourPostProcessingProgram &
    ./YourProcessingProgram

Jim Dempsey