Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Seg Fault when using US NFS install of MPI 5.1.0.038 from site in Russia

Rashawn_K_Intel1
Employee
625 Views

Hello,

One of my team members from Russia is accessing a NFS installation of MPI 5.1.0.038 located at a US site. When this team member runs the simple ring application test.c, she encounters a segmentation fault when running with four processes and one process per node. This does not happen for the team members based at US sites.  The seg fault does not happen when the application is executed on only a single node, the login node.

The test.c application was compiled by each team member in this way (in a user-specific scratch space in the US NFS allocation) :

	mpiicc –g -o testc-intelMPI test.c

To run the executable, we use:

	mpirun -n 4 -perhost 1 -env I_MPI_FABRICS tcp -hostfile /nfs/<pathTo>/machines.LINUX ./testc-intelMPI

For the U.S based team members, the output is as follows:

	Hello world: rank 0 of 4 running on <hostname1>
	Hello world: rank 1 of 4 running on <hostname2>
	Hello world: rank 2 of 4 running on <hostname3>
	Hello world: rank 3 of 4 running on <hostname4>

When my Russian team member executes this in the same manner, the segmentation fault message states:

	/nfs/<pathTo>/intel-5.1.0.038/compilers_and_libraries_2016.0.079/linux/mpi/intel64/bin/mpirun: line 241:  7902 Segmentation fault      (core dumped) mpiexec.hydra "$@" 0<&0

When using gdb, we learn the following:

	Program received signal SIGSEGV, Segmentation fault.
	mfile_fn (arg=0x0, argv=0x49cdc8) at ../../ui/mpich/utils.c:448


We do not have the source files with this installation and are unable to inspect utils.c.

Conversely, to run on just the login node with:

	mpirun -n 4 -perhost 1 ./testc-intelMPI

No segmentation fault happens:

	Hello world: rank 0 of 4 running on <loginHostname>
	Hello world: rank 1 of 4 running on <loginHostname>
	Hello world: rank 2 of 4 running on <loginHostname>
	Hello world: rank 3 of 4 running on <loginHostname>

Let me know of any suggestions for how I can change the environment to enable my Russian team member to run this code correctly.

Thank you,

Rashawn Knapp

0 Kudos
2 Replies
Artem_R_Intel1
Employee
625 Views

Hello Rashawn,

Could you please try to reproduce the failure with '-v' mpirun's option and provide the output?

0 Kudos
Rashawn_K_Intel1
Employee
625 Views

Hello Artem and others,

Thank you for your suggestion.  We resolved the issue earlier today.  The original execution by the team member had a typo; when repeated today with the '-v' option and the correct mpirun parameters, it ran as expected.

Regards,

Rashawn

0 Kudos
Reply