Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2221 Discussions

MPI_File_read_all MPI_File_write_all local size limit

Mochalskyy__Serhiy
2,648 Views

Dear Intel support team,

I have problem with MPI_File_read_all MPI_File_rwrite_all subroutines. I have a fortran code that should read large binary file (~2TB). In this file are few 2D matrices. The largest matrix has size ~0.5TB. I read this file using MPI IO soubrutines something like this:

          call MPI_TYPE_CREATE_SUBARRAY(2,dim,loc_sizes,loc_starts,MPI_ORDER_FORTRAN,MPI_DOUBLE_PRECISION,my_subarray,ierr)
          call MPI_Type_commit(my_subarray,ierr)
          call MPI_File_set_view(filehandle, disp,MPI_DOUBLE_PRECISION,my_subarray, &
                         "native",MPI_INFO_NULL, ierr)

          call MPI_File_read_all(filehandle, float2d, loc_sizes(1)*loc_sizes(2),MPI_DOUBLE_PRECISION,status, ierr)

The problem occurs in MPI_File_read_all call. The number of elements in each submatrices loc_sizes(1)*loc_sizes(2) multiply by the matrix type (8 bytes in Double precision) can not be larger than Integer allowed number 2147483647 (~2GB). In my case each submatrices will have  more than 10-20 GB. I tried instead of using integer*4 to use integer*8 but it did not help as MPI subroutine I think transform it again to integer*4. Is there any solution of this problem as you did for example in  MPI_File_set_view where displacment type was changed from integer to INTEGER(KIND=MPI_OFFSET_KIND), INTENT(IN) :: disp. The program works fine if the submatrix size is smaller than 2147483647 bytes.

Here is the error message that I got:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
libifcore.so.5     00002ADA8C450876  for__signal_handl     Unknown  Unknown
libc-2.17.so       00002ADA928C8670  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91AAEB06  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91AAF780  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91AA3039  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91AA49E4  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91727370  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA919A1C00  Unknown               Unknown  Unknown
libmpi.so.12.0     00002ADA91971B90  Unknown               Unknown  Unknown
libmpi.so.12       00002ADA9193EFF8  MPI_Isend             Unknown  Unknown
libmpi.so.12.0     00002ADA91695A61  Unknown               Unknown  Unknown
libmpi.so.12       00002ADA916943B8  ADIOI_GEN_ReadStr     Unknown  Unknown
libmpi.so.12       00002ADA91A6DDF5  PMPI_File_read_al     Unknown  Unknown
libmpifort.so.12.  00002ADA912AB4CB  mpi_file_read_all     Unknown  Unknown
jorek_model199     000000000044E747  vacuum_response_m         519  vacuum_response.f90
jorek_model199     000000000044B770  vacuum_response_m         986  vacuum_response.f90
jorek_model199     000000000044A6F4  vacuum_response_m          90  vacuum_response.f90
jorek_model199     000000000041134E  MAIN__                    486  jorek2_main.f90
jorek_model199     000000000040C95E  Unknown               Unknown  Unknown
libc-2.17.so       00002ADA928B4B15  __libc_start_main     Unknown  Unknown

 

Thank you in advance,

Mochalskyy Serhiy

 

0 Kudos
4 Replies
Gregg_S_Intel
Employee
2,647 Views

This sounds like is a request to a change to the MPI standard itself, perhaps more appropriate for mpi-forum.org.

Have you looked already into using ILP64 library?  http://software.intel.com/en-us/node/528842

0 Kudos
Mochalskyy__Serhiy
2,648 Views

I would like to ask first of all if what I discovered is correct? Can we by means of Intel MPI usings MPI_File_read_all subroutine to read subarray larger than 2GB. For example can 2 MPI tasks read 5 GB unformatted file having 2.5 distributed subarray. Can anyone confirm this issue?

If what I wrote above true I think Intel can change the implementation of the MPI_File_read_all subroutine as the restriction here is inside the subroutine itself. The limitation is as I wrote above that COUNTS_of_subarray_elements*Array_typre_size<2147483647 bytes. This happens inside the subroutine. At least Intel without changing the standard can modify the implementation without multiplitation on the array_type. Therefore, we can increase subarray size for Double Precision up to 16 GB.

And again if what I wrote above true will be in the future any attempts to overcome this restriction? in moder big data technologies 2 GB per MPI task is really not much and not enough for many applications.

Than you Gregg S. for your proposition to use ILP64 library. I will take a look on it.

0 Kudos
Mochalskyy__Serhiy
2,648 Views

Gregg S. (Intel) wrote:

Have you looked already into using ILP64 library?  http://software.intel.com/en-us/node/528842

I tried to use this library, however the code compilation requires to use 4 byte integer with -i4 compilation option. Therefore, This library which requires to compile the code with -i8 can't solve my problem.

0 Kudos
Gregg_S_Intel
Employee
2,648 Views

It is indeed a sledgehammer solution, but let me say this is what commercial software vendors are doing to solve 2 GB limits in MPI.

Alternatively, break the IO into chunks less than 2 GB.

0 Kudos
Reply