Community
cancel
Showing results for 
Search instead for 
Did you mean: 
JY_L_
Beginner
279 Views

MPI "Bad Termination" problem

Hi,

I have trouble getting to run the test program included in the Intel MPI package.

Specially, I've installed MPI in my home directory (which is NSF-mounted on both host1 and host2 machines) and then compiled test.c using mpicc 

mpicc -show -o test test.c
gcc -o 'test' 'test.c' -I/home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/include -L/home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/release_mt -L/home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/release_mt -Xlinker -rpath -Xlinker /home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/2107.0.0/intel64/lib/release_mt -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/2017.0.0/intel64/lib -lmpifort -lmpi -lmpigi -ldl -lrt -lpthread

I checked that the machines connect to each other fine

mpirun -ppn 1 -n 2 -hosts host1,host2 hostname
host1
host2

However, when I run the test program, I encountered the following errors:

mpirun -ppn 1 -n 2 -hosts host1,host2 ./test

[0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 2  Build 20170125 (id: 16752)
[0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation.  All rights reserved.
[0] MPI startup(): Multi-threaded optimized library

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 5582 RUNNING AT host2
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 5582 RUNNING AT host2
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
   Intel(R) MPI Library troubleshooting guide:
      https://software.intel.com/node/561764
===================================================================================

I then attached gdb to the core file generated, and here's the trace

gdb ./test core
(gdb) bt
#0  __GI_____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<optimized out>, loc=0x7faf932e2060 <_nl_global_locale>)
    at ../stdlib/strtol_l.c:298
#1  0x00007faf9368e11a in atoi (__nptr=<optimized out>) at /usr/include/stdlib.h:286
#2  i_mpi_numa_nodes_compare (a=0x0, b=0x0) at ../../src/mpid/ch3/src/mpid_init.c:62
#3  0x00007faf92f5b419 in msort_with_tmp (p=0x7fff5c532aa0, b=0xac7b30, n=2) at msort.c:83
#4  0x00007faf92f5b6cc in msort_with_tmp (n=2, b=0xac7b30, p=0x7fff5c532aa0) at msort.c:45
#5  __GI_qsort_r (b=0xac7b30, n=2, s=8, cmp=0x7faf9368e100 <i_mpi_numa_nodes_compare>, arg=<optimized out>) at msort.c:297
#6  0x00007faf936911af in MPID_nem_impi_create_numa_nodes_map () at ../../src/mpid/ch3/src/mpid_init.c:1305
#7  0x00007faf93692284 in MPID_Init (argc=0x0, argv=0x0, requested=10, provided=0x0, has_args=0x7faf932e2060 <_nl_global_locale>, has_env=0xac7db1)
    at ../../src/mpid/ch3/src/mpid_init.c:1732
#8  0x00007faf9362872b in MPIR_Init_thread (argc=0x0, argv=0x0, required=10, provided=0x0) at ../../src/mpi/init/initthread.c:717
#9  0x00007faf93615e2b in PMPI_Init (argc=0x0, argv=0x0) at ../../src/mpi/init/init.c:253
#10 0x0000000000400a6e in main ()
(gdb) 

Does anyone have some clue as to what is going on? Thank you very much in advance!

Jenny

0 Kudos
6 Replies
gaston-hillar
Black Belt
279 Views

Hi Jy L.,

Have you read the following article that describes examples of Communication Problems with Intel MPI Library. I think it can help with your problem.

0 Kudos
JY_L_
Beginner
279 Views

Thanks for the suggestion. I've seen this doc before, but am not sure how it helps with my problem.

BTW, Do you know where the system log for Intel MPI is kept? 

BTW, my latest suspicion is that this might be because host1 and host2 have AMD processors? I can run the test program fine in intel processor-based machines..

0 Kudos
gaston-hillar
Black Belt
279 Views

Hi Jy L.,

The error messages where similar to those generated when communication problems happen. I guess I didn't understand very well what you were doing and I was confused.

Is test.c the name for your own code file or is it one of the sample code files? If you provide more detailed info on what you're doing, I'm sure somebody will provide useful help.

Can you share the code you are running, in case it is not sample code.

0 Kudos
gaston-hillar
Black Belt
279 Views

Hi Jy L.,

Based on previous discussions I've seen about MPI, I think that the best place to post your question is the Intel Clusters and HPC Techonology forum where all things related to MPI are discussed. In this forum, we discuss about Intel MIC Architecture.

0 Kudos
JY_L_
Beginner
279 Views

I'm using the sample code linux/mpi/test/test.c 

Thanks for the suggestion that I should post to the HPC technology forum!

 

JY

0 Kudos
gaston-hillar
Black Belt
279 Views

Hi Jy L.,

Got it. Unluckily, I haven't played with the sample code. I think the forum I mentioned will be the best place because Intel Engineers know the sample code.

0 Kudos