- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an MPI application in Fortran that use the HDF5 library. I am currently using version 2019.9.304 of Intel MPI (on RHEL 7.7).
When on Lustre filesystems I set the environment variable I_MPI_EXTRA_FILESYSTEM=1 to have MPI enabling the Lustre filesystem support/optimizations. This usually works fine.
In one particular case I encounter a problem deep, deep inside both the HDF5 and further in the MPI library when calling h5fclose_f, after having opened a file, created two groups, written a few datasets and then closing the file again. The application is using the HDF5 library in collective IO mode.
On one rank (rank 0) i get:
Request pending due to failure, error stack:
PMPI_Waitall(346): MPI_Waitall(count=2, req_array=0x5581dc40, status_array=0x1) failed
PMPI_Waitall(322): The supplied request in array element 0 was invalid (kind=4)
On two other ranks I get:
Request pending due to failure, error stack:
PMPI_Waitall(346): MPI_Waitall(count=1, req_array=0x529fe830, status_array=0x1) failed
PMPI_Waitall(322): The supplied request in array element 0 was invalid (kind=0)
Other ranks are fine, no errors. Notice different "kind=" in the error messages.
When turning off the I_MPI_EXTRA_FILESYSTEM everything works fine.
I can create a full backtrace from a custom MPI error handler, the backtrace is the same on all three failing ranks:
#2 0x7f86b92d39a9 in MPIR_Err_return_comm
at ../../src/mpi/errhan/errutil.c:321
#3 0x7f86b9a3bf3a in PMPI_Waitall
at ../../src/mpi/request/waitall.c:351
#4 0x7f86b901ec4a in ADIOI_LUSTRE_W_Exchange_data
at ../../../../../src/mpi/romio/adio/ad_lustre/ad_lustre_wrcoll.c:952
#5 0x7f86b901d997 in ADIOI_LUSTRE_Exch_and_write
at ../../../../../src/mpi/romio/adio/ad_lustre/ad_lustre_wrcoll.c:642
#6 0x7f86b901c52f in ADIOI_LUSTRE_WriteStridedColl
at ../../../../../src/mpi/romio/adio/ad_lustre/ad_lustre_wrcoll.c:322
#7 0x7f86ba1f0bd7 in MPIOI_File_write_all
at ../../../../../src/mpi/romio/mpi-io/write_all.c:114
#8 0x7f86ba1f0cbb in PMPI_File_write_at_all
at ../../../../../src/mpi/romio/mpi-io/write_atall.c:58
#9 0x7f86bbd224ad in H5FD_mpio_write
at /opt/hdf5/1.10.7/source/src/H5FDmpio.c:1636
#10 0x7f86bba7288f in H5FD_write
at /opt/hdf5/1.10.7/source/src/H5FDint.c:248
#11 0x7f86bba3c094 in H5F__accum_write
at /opt/hdf5/1.10.7/source/src/H5Faccum.c:823
#12 0x7f86bbbd89df in H5PB_write
at /opt/hdf5/1.10.7/source/src/H5PB.c:1031
#13 0x7f86bba4ac6d in H5F_block_write
at /opt/hdf5/1.10.7/source/src/H5Fio.c:160
#14 0x7f86bbd11781 in H5C__collective_write
at /opt/hdf5/1.10.7/source/src/H5Cmpio.c:1109
#15 0x7f86bbd13223 in H5C_apply_candidate_list
at /opt/hdf5/1.10.7/source/src/H5Cmpio.c:402
#16 0x7f86bbd0e960 in H5AC__rsp__dist_md_write__flush
at /opt/hdf5/1.10.7/source/src/H5ACmpio.c:1707
#17 0x7f86bbd10651 in H5AC__run_sync_point
at /opt/hdf5/1.10.7/source/src/H5ACmpio.c:2181
#18 0x7f86bbd10739 in H5AC__flush_entries
at /opt/hdf5/1.10.7/source/src/H5ACmpio.c:2324
#19 0x7f86bb93a2e3 in H5AC_flush
at /opt/hdf5/1.10.7/source/src/H5AC.c:740
#20 0x7f86bba406fe in H5F__flush_phase2
at /opt/hdf5/1.10.7/source/src/H5Fint.c:1988
#21 0x7f86bba4344c in H5F__dest
at /opt/hdf5/1.10.7/source/src/H5Fint.c:1255
#22 0x7f86bba44266 in H5F_try_close
at /opt/hdf5/1.10.7/source/src/H5Fint.c:2345
#23 0x7f86bba44727 in H5F__close_cb
at /opt/hdf5/1.10.7/source/src/H5Fint.c:2172
#24 0x7f86bbb04868 in H5I_dec_ref
at /opt/hdf5/1.10.7/source/src/H5I.c:1261
#25 0x7f86bbb04956 in H5I_dec_app_ref
at /opt/hdf5/1.10.7/source/src/H5I.c:1306
#26 0x7f86bba43e9b in H5F__close
at /opt/hdf5/1.10.7/source/src/H5Fint.c:2112
#27 0x7f86bba33147 in H5Fclose
at /opt/hdf5/1.10.7/source/src/H5F.c:594
#28 0x9bcb00 in h5fclose_c
at /opt/hdf5/1.10.7/source/fortran/src/H5Ff.c:476
#29 0x99c42a in __h5f_MOD_h5fclose_f
at /opt/hdf5/1.10.7/source/fortran/src/H5Fff.F90:575
Do any of you have any ideas on the source of this error? Somehow I have a vague feeling that this could be a bug in the MPI implementation, but this is just a feeling. The workaround is obvious, just unset I_MPI_EXTRA_FILESYSTEM, but then there is no parallel IO any more...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Haakon,
Could you please provide a sample reproducer of your program, so we can reproduce the issue in our LFS file system and confirm whether it's a bug or not.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'll try to, but it's not given that I'll manage to make it. The failing code is a code that takes a geometry defined by triangles and intersect this with a Cartesian mesh (140 M cells) when the geometry is rotated step by step. Information on the intersections are then stored in the HDF5 file each step, writing new datasets into the file. Datasets are never deleted. Each time a step is finished computing the HDF5 file is opened for writing/appending and closed completely to allow the user to stop the process without compromising data.
The problem arise not at the first step, but after a while. At this stage the HDF5 file is ~6 GB or so. I can start the process arbitrarily, but when I start it at the failing step or at a step or two before, there is no problem. The problem thus only seem to arise if the existing HDF5 file that is opened have certain data structures/shape/size already.
I'll see if I'm able to reproduce, but cannot promise anything..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Haakon,
Thanks for understanding.
It will help us a lot if you provide a simple sample reproducer that contains at least one of the data structures/shape and HDF5 file commands. So, we can understand more about the error.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Haakon,
I got to know that there is no need to enable I_MPI_EXTRA_FILESYSTEM in the latest MPI versions. The latest versions natively support Lustre file systems. As mentioned in release notes: Intel® MPI Library Release Notes for Linux* OS
"Parallel file systems (GPFS, Lustre, Panfs) are supported natively, removed bindings libraries (removed I_MPI_EXTRA_FILESYSTEM*, I_MPI_LUSTRE* variables)."
As you have said disabling I_MPI_EXTRA_FILESYSTEM disables parallel IO as long you use MPI IO there should be no problem. If you still see any performance drop when you don't enable I_MPI_EXTRA_FILESYSTEM, please let us know.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Haakon,
We haven't heard back from you.
Please let us know if you had observed any performance gap when I_MPI_EXTRA_FILESYSTEM was not set.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Haakon,
We are closing this thread, assuming your issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread.
Any further interaction in this thread will be considered community only.
Regards
Prasanth
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page