MPI_File_read_all MPI_File_write_all local size limit

Mochalskyy__Serhiy · ‎06-19-2017

Dear Intel support team,

I have problem with MPI_File_read_all MPI_File_rwrite_all subroutines. I have a fortran code that should read large binary file (~2TB). In this file are few 2D matrices. The largest matrix has size ~0.5TB. I read this file using MPI IO soubrutines something like this:

          call MPI_TYPE_CREATE_SUBARRAY(2,dim,loc_sizes,loc_starts,MPI_ORDER_FORTRAN,MPI_DOUBLE_PRECISION,my_subarray,ierr)
          call MPI_Type_commit(my_subarray,ierr)
          call MPI_File_set_view(filehandle, disp,MPI_DOUBLE_PRECISION,my_subarray, &
                         "native",MPI_INFO_NULL, ierr)

call MPI_File_read_all(filehandle, float2d, loc_sizes(1)*loc_sizes(2),MPI_DOUBLE_PRECISION,status, ierr)

The problem occurs in MPI_File_read_all call. The number of elements in each submatrices loc_sizes(1)*loc_sizes(2) multiply by the matrix type (8 bytes in Double precision) can not be larger than Integer allowed number 2147483647 (~2GB). In my case each submatrices will have more than 10-20 GB. I tried instead of using integer*4 to use integer*8 but it did not help as MPI subroutine I think transform it again to integer*4. Is there any solution of this problem as you did for example in MPI_File_set_view where displacment type was changed from integer to INTEGER(KIND=MPI_OFFSET_KIND), INTENT(IN) :: disp. The program works fine if the submatrix size is smaller than 2147483647 bytes.

Here is the error message that I got:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
libifcore.so.5     00002ADA8C450876 for__signal_handl     Unknown Unknown
libc-2.17.so       00002ADA928C8670 Unknown               Unknown Unknown
libmpi.so.12.0     00002ADA91AAEB06 Unknown               Unknown Unknown
libmpi.so.12.0     00002ADA91AAF780 Unknown               Unknown Unknown
libmpi.so.12.0     00002ADA91AA3039 Unknown               Unknown Unknown
libmpi.so.12.0     00002ADA91AA49E4 Unknown               Unknown Unknown
libmpi.so.12.0     00002ADA91727370 Unknown               Unknown Unknown
libmpi.so.12.0     00002ADA919A1C00 Unknown               Unknown Unknown
libmpi.so.12.0     00002ADA91971B90 Unknown               Unknown Unknown
libmpi.so.12       00002ADA9193EFF8 MPI_Isend             Unknown Unknown
libmpi.so.12.0     00002ADA91695A61 Unknown               Unknown Unknown
libmpi.so.12       00002ADA916943B8 ADIOI_GEN_ReadStr     Unknown Unknown
libmpi.so.12       00002ADA91A6DDF5 PMPI_File_read_al     Unknown Unknown
libmpifort.so.12. 00002ADA912AB4CB mpi_file_read_all     Unknown Unknown
jorek_model199     000000000044E747 vacuum_response_m         519 vacuum_response.f90
jorek_model199     000000000044B770 vacuum_response_m         986 vacuum_response.f90
jorek_model199     000000000044A6F4 vacuum_response_m          90 vacuum_response.f90
jorek_model199     000000000041134E MAIN__                    486 jorek2_main.f90
jorek_model199     000000000040C95E Unknown               Unknown Unknown
libc-2.17.so       00002ADA928B4B15 __libc_start_main     Unknown Unknown

Thank you in advance,

Mochalskyy Serhiy

Gregg_S_Intel · ‎06-19-2017

This sounds like is a request to a change to the MPI standard itself, perhaps more appropriate for mpi-forum.org.

Have you looked already into using ILP64 library? http://software.intel.com/en-us/node/528842

Mochalskyy__Serhiy · ‎06-20-2017

I would like to ask first of all if what I discovered is correct? Can we by means of Intel MPI usings MPI_File_read_all subroutine to read subarray larger than 2GB. For example can 2 MPI tasks read 5 GB unformatted file having 2.5 distributed subarray. Can anyone confirm this issue?

If what I wrote above true I think Intel can change the implementation of the MPI_File_read_all subroutine as the restriction here is inside the subroutine itself. The limitation is as I wrote above that COUNTS_of_subarray_elements*Array_typre_size<2147483647 bytes. This happens inside the subroutine. At least Intel without changing the standard can modify the implementation without multiplitation on the array_type. Therefore, we can increase subarray size for Double Precision up to 16 GB.

And again if what I wrote above true will be in the future any attempts to overcome this restriction? in moder big data technologies 2 GB per MPI task is really not much and not enough for many applications.

Than you Gregg S. for your proposition to use ILP64 library. I will take a look on it.

Mochalskyy__Serhiy · ‎06-20-2017

Gregg S. (Intel) wrote:

Have you looked already into using ILP64 library? http://software.intel.com/en-us/node/528842

I tried to use this library, however the code compilation requires to use 4 byte integer with -i4 compilation option. Therefore, This library which requires to compile the code with -i8 can't solve my problem.

Gregg_S_Intel · ‎06-20-2017

It is indeed a sledgehammer solution, but let me say this is what commercial software vendors are doing to solve 2 GB limits in MPI.

Alternatively, break the IO into chunks less than 2 GB.