Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
Intel Customer Support will be observing the Martin Luther King holiday on Monday, Jan. 17, and will return on Tues. Jan. 18.
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646

How to select TCP/IP as fabric at runtime with Intel 2019 MPI

Rashawn_K_Intel1
Employee
1,121 Views

Hello,

My apologies: I posted this earlier within another thread, but afterwards decided to submit it as a new query.

I have been struggling for a couple days to figure out the very basic setting of how to correctly instantiate Intel MPI 2019 for use over sockets/TCP. I am able to source mpivars.sh without any parameters and then export FI_PROVIDER=sockets which allows me to compile and run the simple hello world code found all over the place on a single node with n number ranks. However, when I instantiate my environment  in the same way and try to compile PAPI from source, it complains in the configure step that the C compiler (GCC in this case) is not able to create executables. The config.log reveals that it struggles to find libfabric.so.1. Even if I add the libfabrics directory to my LD_LIBRARY_PATH and link to the  libfabrics library, I am not able to build PAPI from source. Additionally, I cannot find good documentation for how to use MPI in the most simple and  basic way - single node and several processes. There is a graphic on several presentations and even software.intel.com/intel-mpi-library which indicates I will be able to choose TCP/IP, among other fabric options, at runtime. I will appreciate your comments and assistance in letting me know the correct way to do this.

Regards,

-Rashawn

0 Kudos
6 Replies
Anatoliy_R_Intel
Employee
1,121 Views

Hi,

What 2019 update are you using? Can you find libfabric.so.1 in the libfabric directory?

--

Best regards, Anatoliy

Rashawn_K_Intel1
Employee
1,120 Views

Hello Anatoliy,

Thank you for your prompt reply.  I am using 2019 update 4 from Parallel Studio Cluster Edition; libfabric.so.1 is in <pathToInstall>/compilers_and_libraries_2019.0.243/linux/mpi/intel64/libfabrics/libfabric.so.1 as a link to libfabric.so.

Regards,

-Rashawn

Rashawn_K_Intel1
Employee
1,120 Views

Hello Anatoliy,

I have not heard back on this.  Do you have a recommendation on what I should do?

Regards,

-Rashawn

Maksim_B_Intel
Employee
1,120 Views

Hi, Rashawn.

If you're using gcc anyway, you might prefer to build third-party binaries without sourcing compilervars.sh, and then

source mpivars.sh

This script is available in intel64/bin directory inside Parallel Studio installation.

After that, you'll be able to check correct libraries are pulled with ldd <your_binary>.

Unless you're running under some scheduler,

mpiexec.hydra -n <amount_of_processes> <your_binary>

will start them up locally by default. If this doesn't happen, provide output of a run with -v flag added, please.

Rashawn_K_Intel1
Employee
1,120 Views

Hello Maksim,

I am completely confused. I want to know the correct arguments to pass to mpivars.sh so that I can execute a simple hello world MPI application over ethernet, not a fancier fabric. This seems to be extremely difficult to ascertain. I have not been successful in being able to do this despite playing around with the input parameters for mpivars.sh:  I have turned -ofi_internal on (1) and turned it off (0), and I have supplied debug and release as the kind; yet I have not found the correct incantation.

As stated in my original query, I am not using compilervars.sh because I am not using Intel compiler suite. I want to use Intel MPI with a non Intel compiler which is perfectly reasonable.

I do know where the instantiation scripts exist for both mpivars and compilervars.

In my professional capacity at Intel, I develop and validate software. For the task I intended to complete last week, I need to verify that it is possible to build a particular tool, PAPI, with GCC and also inform PAPI about the version of MPI I intend to use for the enclosed MPI test applications. I was able to do something very similar in earlier releases of Intel MPI. I am simply starting from the most easy case which is the ability to do this on a single node, meaning without a special fabric, and only requiring the need to execute several MPI ranks on a single node.  I really just need to know  what the correct arguments are to hand to mpivars and where these arguments are documented for MPI 2019.  I ran what you have suggested.  In what I post here, I demonstrate I am unable to execute the hello world MPI application:  

#output of ldd <binary>
ldd hello_mpi.ofi-internal-debugmt
	linux-vdso.so.1 (0x00007ffedc1f5000)
	libmpifort.so.12 => /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007f36194d0000)
	libmpi.so.12 => /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/debug_mt/libmpi.so.12 (0x00007f36178e5000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f36176dd000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f36174bf000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f36172bb000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f3616f01000)
	libgcc_s.so.1 => /nfs/site/proj/coralhpctools/builds/compilers/gcc/gcc-9.1.0/skx/lib64/libgcc_s.so.1 (0x00007f3616ce9000)
	libfabric.so.1 => /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/lib/libfabric.so.1 (0x00007f3616ab1000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f361988f000)
##
##output of mpiexec.hydra -v:
##
 mpiexec.hydra -v -n 1 ./hello_mpi.ofi-internal-debugmt
[mpiexec@anchpcskx1001] Launch arguments: /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin//hydra_bstrap_proxy --upstream-host anchpcskx1001 --upstream-port 34685 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@anchpcskx1001] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_maxes
[proxy:0:0@anchpcskx1001] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_appnum
[proxy:0:0@anchpcskx1001] PMI response: cmd=appnum appnum=0
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@anchpcskx1001] PMI response: cmd=my_kvsname kvsname=kvs_67947_0
Abort(1094799) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(666)......: 
MPID_Init(922).............: 
MPIDI_NM_mpi_init_hook(719): OFI addrinfo() failed (ofi_init.h:719:MPIDI_NM_mpi_init_hook:No data available)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 67964 RUNNING AT anchpcskx1001
=   EXIT STATUS: 143
===================================================================================

##
## Output of mpirun -v
##

> mpirun -v -n 1 ./hello_mpi.ofi-internal-debugmt
[mpiexec@anchpcskx1001] Launch arguments: /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin//hydra_bstrap_proxy --upstream-host anchpcskx1001 --upstream-port 35059 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@anchpcskx1001] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_maxes
[proxy:0:0@anchpcskx1001] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_appnum
[proxy:0:0@anchpcskx1001] PMI response: cmd=appnum appnum=0
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@anchpcskx1001] PMI response: cmd=my_kvsname kvsname=kvs_67978_0
Abort(1094799) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(666)......: 
MPID_Init(922).............: 
MPIDI_NM_mpi_init_hook(719): OFI addrinfo() failed (ofi_init.h:719:MPIDI_NM_mpi_init_hook:No data available)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 67982 RUNNING AT anchpcskx1001
=   EXIT STATUS: 143
===================================================================================

##Contents of hello_mpi.c:
> cat hello_mpi.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d"
           " out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}

Your assistance will be much appreciated.

Best regards,

-Rashawn

 

 

Rashawn_K_Intel1
Employee
1,120 Views

UPDATE. I have been successful in compiling both the MPI hello world program and  PAPI using GCC 9.1.0 as the compiler suite (C, C++, and Fortran) and Intel 2019, update 4,  MPI. First, I instantiated the GCC environment followed by sourcing the Intel mpivars.sh without arguments. Then I reviewed the environment and noted two variables that were set: 1.) LIBRARY_PATH pointing to the directory containing libfabrics.so and 2.) FI_PROVIDER_PATH pointing to a directory containing the FI providers (sockets, tcp, psmx2, verbs, rxm). The LD_LIBRARY_PATH, PATH, and MANPATH had been updated appropriately. With these settings, one is able to compile an MPI code: mpicc <mpisrc.c> -o <mpiBinary>. But this will not execute on a single node with one or more processes. The complaint is:

mpirun -n 1 ./hello_mpi_mpivars-noargsAbort(1094799) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(666)......: 
MPID_Init(922).............: 
MPIDI_NM_mpi_init_hook(719): OFI addrinfo() failed (ofi_init.h:719:MPIDI_NM_mpi_init_hook:No data available)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 7541 RUNNING AT <hostname>
=   EXIT STATUS: 143
===================================================================================

However, when I set FI_PROVIDER=sockets at runtime, I obtain the expected output (it also works with tcp as the provider):

> mpirun -n 4 ./hello_mpi_mpivars-noargs
Hello world from processor <hostname>, rank 2 out of 4 processors
Hello world from processor <hostname>, rank 3 out of 4 processors
Hello world from processor <hostname>, rank 1 out of 4 processors
Hello world from processor <hostname>, rank 0 out of 4 processors

I then tackled the compilation of PAPI using the same environment. It compiled successfully, and  I was able to successfully the PAPI tests I needed to complete.

I definitely had something amiss in my environment  last week when I encountered the error during the PAPI  configure step stating that it could not create C executables and the log file indicated libfabric.so was not found.

I am happy with using the steps above for process communication via sockets or tcp fabric providers.

Thank you  Anatoliy and Maksim for your helpful responses.

Regards,

-Rashawn

Reply