Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2238 Discussions

Intel MPI + HDF5 + Lustre: failure when closing file

hakostra1
New Contributor II
2,925 Views

I have an MPI application in Fortran that use the HDF5 library. I am currently using version 2019.9.304 of Intel MPI (on RHEL 7.7).

When on Lustre filesystems I set the environment variable I_MPI_EXTRA_FILESYSTEM=1 to have MPI enabling the Lustre filesystem support/optimizations. This usually works fine.

In one particular case I encounter a problem deep, deep inside both the HDF5 and further in the MPI library when calling h5fclose_f, after having opened a file, created two groups, written a few datasets and then closing the file again. The application is using the HDF5 library in collective IO mode.

On one rank (rank 0) i get:

Request pending due to failure, error stack:
PMPI_Waitall(346): MPI_Waitall(count=2, req_array=0x5581dc40, status_array=0x1) failed
PMPI_Waitall(322): The supplied request in array element 0 was invalid (kind=4)

On two other ranks I get:

Request pending due to failure, error stack:
PMPI_Waitall(346): MPI_Waitall(count=1, req_array=0x529fe830, status_array=0x1) failed
PMPI_Waitall(322): The supplied request in array element 0 was invalid (kind=0)

Other ranks are fine, no errors. Notice different "kind=" in the error messages.

When turning off the I_MPI_EXTRA_FILESYSTEM everything works fine.

I can create a full backtrace from a custom MPI error handler, the backtrace is the same on all three failing ranks:

 

#2  0x7f86b92d39a9 in MPIR_Err_return_comm
	at ../../src/mpi/errhan/errutil.c:321
#3  0x7f86b9a3bf3a in PMPI_Waitall
	at ../../src/mpi/request/waitall.c:351
#4  0x7f86b901ec4a in ADIOI_LUSTRE_W_Exchange_data
	at ../../../../../src/mpi/romio/adio/ad_lustre/ad_lustre_wrcoll.c:952
#5  0x7f86b901d997 in ADIOI_LUSTRE_Exch_and_write
	at ../../../../../src/mpi/romio/adio/ad_lustre/ad_lustre_wrcoll.c:642
#6  0x7f86b901c52f in ADIOI_LUSTRE_WriteStridedColl
	at ../../../../../src/mpi/romio/adio/ad_lustre/ad_lustre_wrcoll.c:322
#7  0x7f86ba1f0bd7 in MPIOI_File_write_all
	at ../../../../../src/mpi/romio/mpi-io/write_all.c:114
#8  0x7f86ba1f0cbb in PMPI_File_write_at_all
	at ../../../../../src/mpi/romio/mpi-io/write_atall.c:58
#9  0x7f86bbd224ad in H5FD_mpio_write
	at /opt/hdf5/1.10.7/source/src/H5FDmpio.c:1636
#10  0x7f86bba7288f in H5FD_write
	at /opt/hdf5/1.10.7/source/src/H5FDint.c:248
#11  0x7f86bba3c094 in H5F__accum_write
	at /opt/hdf5/1.10.7/source/src/H5Faccum.c:823
#12  0x7f86bbbd89df in H5PB_write
	at /opt/hdf5/1.10.7/source/src/H5PB.c:1031
#13  0x7f86bba4ac6d in H5F_block_write
	at /opt/hdf5/1.10.7/source/src/H5Fio.c:160
#14  0x7f86bbd11781 in H5C__collective_write
	at /opt/hdf5/1.10.7/source/src/H5Cmpio.c:1109
#15  0x7f86bbd13223 in H5C_apply_candidate_list
	at /opt/hdf5/1.10.7/source/src/H5Cmpio.c:402
#16  0x7f86bbd0e960 in H5AC__rsp__dist_md_write__flush
	at /opt/hdf5/1.10.7/source/src/H5ACmpio.c:1707
#17  0x7f86bbd10651 in H5AC__run_sync_point
	at /opt/hdf5/1.10.7/source/src/H5ACmpio.c:2181
#18  0x7f86bbd10739 in H5AC__flush_entries
	at /opt/hdf5/1.10.7/source/src/H5ACmpio.c:2324
#19  0x7f86bb93a2e3 in H5AC_flush
	at /opt/hdf5/1.10.7/source/src/H5AC.c:740
#20  0x7f86bba406fe in H5F__flush_phase2
	at /opt/hdf5/1.10.7/source/src/H5Fint.c:1988
#21  0x7f86bba4344c in H5F__dest
	at /opt/hdf5/1.10.7/source/src/H5Fint.c:1255
#22  0x7f86bba44266 in H5F_try_close
	at /opt/hdf5/1.10.7/source/src/H5Fint.c:2345
#23  0x7f86bba44727 in H5F__close_cb
	at /opt/hdf5/1.10.7/source/src/H5Fint.c:2172
#24  0x7f86bbb04868 in H5I_dec_ref
	at /opt/hdf5/1.10.7/source/src/H5I.c:1261
#25  0x7f86bbb04956 in H5I_dec_app_ref
	at /opt/hdf5/1.10.7/source/src/H5I.c:1306
#26  0x7f86bba43e9b in H5F__close
	at /opt/hdf5/1.10.7/source/src/H5Fint.c:2112
#27  0x7f86bba33147 in H5Fclose
	at /opt/hdf5/1.10.7/source/src/H5F.c:594
#28  0x9bcb00 in h5fclose_c
	at /opt/hdf5/1.10.7/source/fortran/src/H5Ff.c:476
#29  0x99c42a in __h5f_MOD_h5fclose_f
	at /opt/hdf5/1.10.7/source/fortran/src/H5Fff.F90:575

 

 

Do any of you have any ideas on the source of this error? Somehow I have a vague feeling that this could be a bug in the MPI implementation, but this is just a feeling. The workaround is obvious, just unset I_MPI_EXTRA_FILESYSTEM, but then there is no parallel IO any more...

0 Kudos
6 Replies
PrasanthD_intel
Moderator
2,900 Views

Hi Haakon,


Could you please provide a sample reproducer of your program, so we can reproduce the issue in our LFS file system and confirm whether it's a bug or not.


Regards

Prasanth


0 Kudos
hakostra1
New Contributor II
2,879 Views

I'll try to, but it's not given that I'll manage to make it. The failing code is a code that takes a geometry defined by triangles and intersect this with a Cartesian mesh (140 M cells) when the geometry is rotated step by step. Information on the intersections are then stored in the HDF5 file each step, writing new datasets into the file. Datasets are never deleted. Each time a step is finished computing the HDF5 file is opened for writing/appending and closed completely to allow the user to stop the process without compromising data.

The problem arise not at the first step, but after a while. At this stage the HDF5 file is ~6 GB or so. I can start the process arbitrarily, but when I start it at the failing step or at a step or two before, there is no problem. The problem thus only seem to arise if the existing HDF5 file that is opened have certain data structures/shape/size already.

I'll see if I'm able to reproduce, but cannot promise anything..

0 Kudos
PrasanthD_intel
Moderator
2,853 Views

Hi Haakon,


Thanks for understanding.

It will help us a lot if you provide a simple sample reproducer that contains at least one of the data structures/shape and HDF5 file commands. So, we can understand more about the error.


Regards

Prasanth


0 Kudos
PrasanthD_intel
Moderator
2,839 Views

Hi Haakon,


I got to know that there is no need to enable I_MPI_EXTRA_FILESYSTEM in the latest MPI versions. The latest versions natively support Lustre file systems. As mentioned in release notes: Intel® MPI Library Release Notes for Linux* OS

"Parallel file systems (GPFS, Lustre, Panfs) are supported natively, removed bindings libraries (removed I_MPI_EXTRA_FILESYSTEM*, I_MPI_LUSTRE* variables)."

As you have said disabling I_MPI_EXTRA_FILESYSTEM disables parallel IO as long you use MPI IO there should be no problem. If you still see any performance drop when you don't enable I_MPI_EXTRA_FILESYSTEM, please let us know.


Regards

Prasanth


0 Kudos
PrasanthD_intel
Moderator
2,797 Views

Hi Haakon,


We haven't heard back from you.

Please let us know if you had observed any performance gap when I_MPI_EXTRA_FILESYSTEM was not set.


Regards

Prasanth


0 Kudos
PrasanthD_intel
Moderator
2,777 Views

Hi Haakon,


We are closing this thread, assuming your issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. 

Any further interaction in this thread will be considered community only.


Regards

Prasanth


0 Kudos
Reply