Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
2278 Discussions

Issue while spawning processes across multiple nodes (EXIT CODE: 9)

psing51
New Contributor I
2,895 Views

Hi all,
I am using intel parallel studio 2015 on Intel(R) Xeon(R) CPU E5-2680 v3 (RHEL-6.5) and currently facing issues with an mpi based application(Nas Parallel Benchmark-BT). Though the issue seems application specific, I would like to have your opinions on methodology to debug/fix issues like these .

I was successful in testing the mpi setup as :-
 

[puneets@host01 bin]$ cat hosts.txt 
host02
host03

[puneets@host01 bin]$ mpirun -np 4 -ppn 2 -hostfile hosts.txt ./hello 
host02
host02
host03
host03

But when i try to run the application, I end up with:-

[puneets@host01 bin]$ mpirun -np 4 -ppn 2 -hostfile hosts.txt ./bt.E.4.mpi_io_full 

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 25799 RUNNING AT host03
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)

I have attached verbose log of the error (VERBOSE.txt). 

​[puneets@host01 bin]$ mpirun -genv I_MPI_HYDRA_DEBUG=1 -hostfile hosts.txt -genv I_MPI_DEBUG=5 -np 4 -ppn 2 ./bt.E.4.mpi_io_full 

Whereas, on single node, i am able to run the application as:-

[puneets@host01 bin]$ mpirun -np 4  ./bt.E.4.mpi_io_full 


 NAS Parallel Benchmarks 3.3 -- BT Benchmark 

 No input file inputbt.data. Using compiled defaults
 Size: 1020x1020x1020
 Iterations:  250    dt:   0.0000040
 Number of active processes:     4

 BTIO -- FULL MPI-IO write interval:   5



I am attaching the make.def and compilation log for your reference.
Any help/Hint will be very useful. Eagerly awaiting your replies.
 

0 Kudos
1 Reply
psing51
New Contributor I
2,895 Views

Hi all,
I tried running this benchmark on compute nodes via PBS, I again end up with similar error. 
Here is my job submission script:
 

#!/bin/bash
#PBS -N NPB_N4_TPP24
#PBS -l select=2:ncpus=24:mpiprocs=2
#PBS -q test
#PBS -o output1.txt
#PBS -e error1.txt
#PBS -P cc
cd $PBS_O_WORKDIR


export OMP_NUM_THREADS=12
module load suite/intel/parallelStudio
mpirun -np 4 -hostfile $PBS_NODEFILE -genv I_MPI_HYDRA_DEBUG=1   -genv OMP_NUM_THREADS=12 -genv I_MPI_DEBUG=5 -ppn 2 ./bt.E.4.mpi_io_full

.

This seems to be an issue with NPB's class E problems.
I recompiled NPB for class D , and i was able to run the benchmark on multiple nodes.

Do let me know if you are able to identify bug with class E problem(each compute nodes in my setup has 64GB RAM).

0 Kudos
Reply