Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

IMPI Creates Large Files on Startup

Mike_D_6
Beginner
790 Views

Operating system and version: CentOS Linux release 7.5.1804
Intel MPI version: 2019.5.281
Compiler and version: 19.0.5.281
Fabric: Mellanox Technologies MT27500
Libfabric version: 1.7.2

I have a quick question regarding Intel MPI and large (> 1GB) files created by MPI at runtime. We maintain part of a large code and the standard test suite for this code imposes a limit on file size to keep the tests small. File size is restricted using "limit filesize 1024m" (csh), but I also tested with "limit -f 1024000" (BASH) with the same results. When I start mpirun for any code requesting more than a single core, I apparently exceed this filesize limit and the code crashes. A minimal example:

   program hello
   include 'mpif.h'
   integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)

   call MPI_INIT(ierror)
   call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
   print*, 'node', rank, ': Hello world'
   call MPI_FINALIZE(ierror)
   end

This runs fine when called as follows:

#!/bin/bash
ulimit -f 1024000
nproc=1
mpirun -n $nproc -ppn $nproc ./mpi-test

It also runs fine without the ulimit and with nproc increased, but crashes with the ulimit and nproc >= 2 with the following error:

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 78358 RUNNING AT pc-beethoven.cluster
=   KILLED BY SIGNAL: 25 (File size limit exceeded)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 78359 RUNNING AT pc-beethoven.cluster
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

As this minimal example doesn't read or write any files, I guess the files must belong to mpirun. Is this "normal" behavior with IMPI (something I need to contact the main developers about), or does it indicate a problem with our IMPI installation?

Thanks in advance for any help!

0 Kudos
4 Replies
AbhishekD_Intel
Moderator
790 Views

Hello Mike,

I have also faced the same issue after setting the limits, for single-core its working fine but more than 2 its give me the same error, there is no installation problem I guess.

We will get back to you after debugging it.

-Abhishek

0 Kudos
Mike_D_6
Beginner
790 Views

Hello Abhishek,

Thanks, that's good to know. In case it helps at all for comparison, the minimum value for "limit filesize" as a function of number of cores that we get for this example is:

1 core: limit filesize 2
2 cores: limit filesize 1661460
4 cores: limit filesize 1745940

Mike

0 Kudos
Michael_Intel
Moderator
790 Views

Hi Mike,

The required filesize that you are observing is part of our shared memory fabric / user memory implementation and allocating a shared memory heap from /dev/shm. Therefore it actually works as intended.

You could theoretically disable it by switching off the shared memory implementation and also disabling the shared memory for user allocations.:

I_MPI_SHM=off

I_MPI_SHM_HEAP_VSIZE=0

However you would also loose shared memory communication performance.

Best regards,

Michael

0 Kudos
Mike_D_6
Beginner
790 Views

Hi Michael,

That makes sense, thanks a lot for looking into it! I'll inform the main developers of our code so that we can implement a workaround.

Best regards,
Mike

0 Kudos
Reply