I have trouble getting to run the test program included in the Intel MPI package.
Specially, I've installed MPI in my home directory (which is NSF-mounted on both host1 and host2 machines) and then compiled test.c using mpicc
mpicc -show -o test test.c gcc -o 'test' 'test.c' -I/home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/include -L/home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/release_mt -L/home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/release_mt -Xlinker -rpath -Xlinker /home/jyli/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/2107.0.0/intel64/lib/release_mt -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/2017.0.0/intel64/lib -lmpifort -lmpi -lmpigi -ldl -lrt -lpthread
I checked that the machines connect to each other fine
mpirun -ppn 1 -n 2 -hosts host1,host2 hostname host1 host2
However, when I run the test program, I encountered the following errors:
mpirun -ppn 1 -n 2 -hosts host1,host2 ./test  MPI startup(): Intel(R) MPI Library, Version 2017 Update 2 Build 20170125 (id: 16752)  MPI startup(): Copyright (C) 2003-2017 Intel Corporation. All rights reserved.  MPI startup(): Multi-threaded optimized library =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 5582 RUNNING AT host2 = EXIT CODE: 11 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 5582 RUNNING AT host2 = EXIT CODE: 11 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== Intel(R) MPI Library troubleshooting guide: https://software.intel.com/node/561764 ===================================================================================
I then attached gdb to the core file generated, and here's the trace
gdb ./test core (gdb) bt #0 __GI_____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<optimized out>, loc=0x7faf932e2060 <_nl_global_locale>) at ../stdlib/strtol_l.c:298 #1 0x00007faf9368e11a in atoi (__nptr=<optimized out>) at /usr/include/stdlib.h:286 #2 i_mpi_numa_nodes_compare (a=0x0, b=0x0) at ../../src/mpid/ch3/src/mpid_init.c:62 #3 0x00007faf92f5b419 in msort_with_tmp (p=0x7fff5c532aa0, b=0xac7b30, n=2) at msort.c:83 #4 0x00007faf92f5b6cc in msort_with_tmp (n=2, b=0xac7b30, p=0x7fff5c532aa0) at msort.c:45 #5 __GI_qsort_r (b=0xac7b30, n=2, s=8, cmp=0x7faf9368e100 <i_mpi_numa_nodes_compare>, arg=<optimized out>) at msort.c:297 #6 0x00007faf936911af in MPID_nem_impi_create_numa_nodes_map () at ../../src/mpid/ch3/src/mpid_init.c:1305 #7 0x00007faf93692284 in MPID_Init (argc=0x0, argv=0x0, requested=10, provided=0x0, has_args=0x7faf932e2060 <_nl_global_locale>, has_env=0xac7db1) at ../../src/mpid/ch3/src/mpid_init.c:1732 #8 0x00007faf9362872b in MPIR_Init_thread (argc=0x0, argv=0x0, required=10, provided=0x0) at ../../src/mpi/init/initthread.c:717 #9 0x00007faf93615e2b in PMPI_Init (argc=0x0, argv=0x0) at ../../src/mpi/init/init.c:253 #10 0x0000000000400a6e in main () (gdb)
Does anyone have some clue as to what is going on? Thank you very much in advance!
Thanks for the suggestion. I've seen this doc before, but am not sure how it helps with my problem.
BTW, Do you know where the system log for Intel MPI is kept?
BTW, my latest suspicion is that this might be because host1 and host2 have AMD processors? I can run the test program fine in intel processor-based machines..
Hi Jy L.,
The error messages where similar to those generated when communication problems happen. I guess I didn't understand very well what you were doing and I was confused.
Is test.c the name for your own code file or is it one of the sample code files? If you provide more detailed info on what you're doing, I'm sure somebody will provide useful help.
Can you share the code you are running, in case it is not sample code.
Hi Jy L.,
Based on previous discussions I've seen about MPI, I think that the best place to post your question is the Intel Clusters and HPC Techonology forum where all things related to MPI are discussed. In this forum, we discuss about Intel MIC Architecture.