Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

MPI_FILE_SET_VIEW produces a seg fault in Windows 10

Kevin_McGrattan
1,051 Views

I have a large CFD code that uses a parallel MPI write routine. The code compiles and runs on our Windows 7 machines (Intel Fortran 16 and Intel MPI 5.1.2), but the code fails under Windows 10. The failure always occurs in the call MPI_FILE_SET_VIEW. I wrote a short program to demonstrate the problem. This program runs on a Windows 7 machine and fails under Windows 10, regardless of which platform we compile on.

program test_mpi
use mpi
implicit none

integer i, size, rank, ierr, N
real :: T_USED(14)
character(80) :: FN_CPU
INTEGER, PARAMETER :: LINE_LENGTH=159
CHARACTER, PARAMETER :: LF=ACHAR(10)
CHARACTER(LEN=LINE_LENGTH+1) :: LINE,HEAD
INTEGER :: RECORD,FH,STATUS(MPI_STATUS_SIZE)

call MPI_INIT (ierr)
call MPI_COMM_SIZE (MPI_COMM_WORLD, size, ierr)
call MPI_COMM_RANK (MPI_COMM_WORLD, rank, ierr)

FN_CPU = 'test_mpi_write.csv'
T_USED = real(rank+1)

CALL MPI_TYPE_CONTIGUOUS(LINE_LENGTH+1,MPI_CHARACTER,RECORD,ierr)
CALL MPI_TYPE_COMMIT(RECORD,ierr)
CALL MPI_FILE_OPEN(MPI_COMM_WORLD,FN_CPU,MPI_MODE_WRONLY+MPI_MODE_CREATE,MPI_INFO_NULL,FH,ierr)
CALL MPI_FILE_SET_VIEW(FH,0_MPI_OFFSET_KIND,RECORD,RECORD,'NATIVE',MPI_INFO_NULL,ierr)

DO N=0,size-1
   IF (rank/=N) CYCLE
   WRITE(0,*) 'rank ',rank,' writes a line'
   WRITE(LINE,'(ES10.3,13(",",ES10.3))') T_USED(1:14)
   LINE(LINE_LENGTH+1:LINE_LENGTH+1) = LF
   CALL MPI_FILE_WRITE_AT(FH,INT(N+1,MPI_OFFSET_KIND),LINE,1,RECORD,STATUS,ierr)
ENDDO

CALL MPI_FILE_CLOSE(FH,ierr)
CALL MPI_TYPE_FREE(RECORD,ierr)
call MPI_FINALIZE (ierr)

end program

 

 

 

0 Kudos
11 Replies
Gergana_S_Intel
Employee
1,051 Views

Thanks for getting in touch.  Let me try to reproduce this and I'll get back to you soon.

~Gergana

0 Kudos
Gergana_S_Intel
Employee
1,051 Views

After getting some info from the Intel MPI developers, it turns out this is a known issue that was discovered in a recently update.  It will be fixed in our upcoming Beta release targeted for early April.  That will be our next major release.

Is this acceptable?  Do you need a fix in a production-level version of Intel MPI (e.g. Intel MPI 5.1.x)?

Thanks and best regards,
~Gergana

0 Kudos
Kevin_McGrattan
1,051 Views

Frankly, no. This code is an important part of our research program, and we distribute it to several thousand users. I cannot say that it will not work under Windows 10 until April. If there is a simple work around, I will implement it, so long as I do not have to remove all my parallel writes. As you can see in the test case, all I want to do is have each MPI process write a single character string, plus carriage return, to a text file. For large numbers of processes, I cannot do this as a serial write without a significant slowdown.

0 Kudos
Gergana_S_Intel
Employee
1,051 Views

Hi,

I'm looking into the possibility of providing you a patch prior to April.  Will a patch be acceptable or do you need production-level release?  The latter will be much harder to do prior to our release in April.

I'll try to update again in the new few days.

Thanks,
~Gergana

0 Kudos
Kevin_McGrattan
1,051 Views

I'm not sure what you mean by a "patch". Do you mean something less than a full download and update of the compiler? Is so, that would be fine. Also, if there is some workaround for the line that is causing the problem, we can use that until April. I just don't want to have to remove and replace all these MPI_FILE_WRITEs. Is there a way to set the file view differently in this situation?

Thanks

0 Kudos
Gergana_S_Intel
Employee
1,051 Views

Hi,

Yes, it won't be the full Intel MPI product, we'll provide you with specific files you'll have to replace in your local installation (usually things like libmpi.so, etc.)  Speaking of, the Intel MPI team already has created a package for you that has this issue fixed.  You'll have to apply this patched version to your current Intel MPI install.  I'll provide you with directions when I send you the package.  Since we need an additional day in order to do some legal checks, expect the new "patch" around middle of this week.

I'll send you a direct message with the files attached.

The issue is directly related to our implementation of MPI_FILE_WRITE* calls so I don't know of any easy workaround that doesn't involve you removing those calls.  Might be easier to wait for the patch.

Let me know if that sounds reasonable.

Regards,
~Gergana

0 Kudos
Kevin_McGrattan
1,051 Views

Yes, that's fine. Thanks.
 

0 Kudos
Gergana_S_Intel
Employee
1,051 Views

Hi,

I've provided you the patch via email.  Please update me and the rest of the community here on whether it works for you.

As a reference to everyone else, this should be fixed in our next major version of Intel MPI.  If you need a fix sooner than that, let us know and we'll provide you the patch.

Regards,
~Gergana

0 Kudos
Kevin_McGrattan
1,051 Views

The patch worked. Thank you very much.

0 Kudos
Gergana_S_Intel
Employee
1,051 Views

Thanks for the update!  I'll let the team know as well.

All the best,
~Gergana

0 Kudos
Kevin_McGrattan
1,051 Views

We recently installed the latest Intel Fortran, C/C++, and MPI packages (17 update 1). Before doing this, we uninstalled a patched version of version 16 (see above). We cannot get the new version of the software to run an MPI job. However, when we restore the "patch" mpiexec, dll's etc, things work. The error with the new version starts like this:

[unset]: Error reading initack on 556
Error on readline:: No error
[unset]: write_line error; fd=556 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : No error
[unset]: Unable to write to PMI_fd
[unset]: Error reading initack on 564
Error on readline:: No error
[unset]: write_line error; fd=564 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : No error
[unset]: Unable to write to PMI_fd
[unset]: write_line error; fd=556 buf=:cmd=barrier_in
:
system msg for write_line failure : No error
[unset]: write_line error; fd=556 buf=:cmd=get_ranks2hosts

0 Kudos
Reply