Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Rashawn_K_Intel1
Employee
757 Views

How to select TCP/IP as fabric at runtime with Intel 2019 MPI

Hello,

My apologies: I posted this earlier within another thread, but afterwards decided to submit it as a new query.

I have been struggling for a couple days to figure out the very basic setting of how to correctly instantiate Intel MPI 2019 for use over sockets/TCP. I am able to source mpivars.sh without any parameters and then export FI_PROVIDER=sockets which allows me to compile and run the simple hello world code found all over the place on a single node with n number ranks. However, when I instantiate my environment  in the same way and try to compile PAPI from source, it complains in the configure step that the C compiler (GCC in this case) is not able to create executables. The config.log reveals that it struggles to find libfabric.so.1. Even if I add the libfabrics directory to my LD_LIBRARY_PATH and link to the  libfabrics library, I am not able to build PAPI from source. Additionally, I cannot find good documentation for how to use MPI in the most simple and  basic way - single node and several processes. There is a graphic on several presentations and even software.intel.com/intel-mpi-library which indicates I will be able to choose TCP/IP, among other fabric options, at runtime. I will appreciate your comments and assistance in letting me know the correct way to do this.

Regards,

-Rashawn

0 Kudos
6 Replies
Anatoliy_R_Intel
Employee
757 Views

Hi,

What 2019 update are you using? Can you find libfabric.so.1 in the libfabric directory?

--

Best regards, Anatoliy

Rashawn_K_Intel1
Employee
756 Views

Hello Anatoliy,

Thank you for your prompt reply.  I am using 2019 update 4 from Parallel Studio Cluster Edition; libfabric.so.1 is in <pathToInstall>/compilers_and_libraries_2019.0.243/linux/mpi/intel64/libfabrics/libfabric.so.1 as a link to libfabric.so.

Regards,

-Rashawn

Rashawn_K_Intel1
Employee
756 Views

Hello Anatoliy,

I have not heard back on this.  Do you have a recommendation on what I should do?

Regards,

-Rashawn

Maksim_B_Intel
Employee
756 Views

Hi, Rashawn.

If you're using gcc anyway, you might prefer to build third-party binaries without sourcing compilervars.sh, and then

source mpivars.sh

This script is available in intel64/bin directory inside Parallel Studio installation.

After that, you'll be able to check correct libraries are pulled with ldd <your_binary>.

Unless you're running under some scheduler,

mpiexec.hydra -n <amount_of_processes> <your_binary>

will start them up locally by default. If this doesn't happen, provide output of a run with -v flag added, please.

Rashawn_K_Intel1
Employee
756 Views

Hello Maksim,

I am completely confused. I want to know the correct arguments to pass to mpivars.sh so that I can execute a simple hello world MPI application over ethernet, not a fancier fabric. This seems to be extremely difficult to ascertain. I have not been successful in being able to do this despite playing around with the input parameters for mpivars.sh:  I have turned -ofi_internal on (1) and turned it off (0), and I have supplied debug and release as the kind; yet I have not found the correct incantation.

As stated in my original query, I am not using compilervars.sh because I am not using Intel compiler suite. I want to use Intel MPI with a non Intel compiler which is perfectly reasonable.

I do know where the instantiation scripts exist for both mpivars and compilervars.

In my professional capacity at Intel, I develop and validate software. For the task I intended to complete last week, I need to verify that it is possible to build a particular tool, PAPI, with GCC and also inform PAPI about the version of MPI I intend to use for the enclosed MPI test applications. I was able to do something very similar in earlier releases of Intel MPI. I am simply starting from the most easy case which is the ability to do this on a single node, meaning without a special fabric, and only requiring the need to execute several MPI ranks on a single node.  I really just need to know  what the correct arguments are to hand to mpivars and where these arguments are documented for MPI 2019.  I ran what you have suggested.  In what I post here, I demonstrate I am unable to execute the hello world MPI application:  

#output of ldd <binary>
ldd hello_mpi.ofi-internal-debugmt
	linux-vdso.so.1 (0x00007ffedc1f5000)
	libmpifort.so.12 => /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007f36194d0000)
	libmpi.so.12 => /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/debug_mt/libmpi.so.12 (0x00007f36178e5000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f36176dd000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f36174bf000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f36172bb000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f3616f01000)
	libgcc_s.so.1 => /nfs/site/proj/coralhpctools/builds/compilers/gcc/gcc-9.1.0/skx/lib64/libgcc_s.so.1 (0x00007f3616ce9000)
	libfabric.so.1 => /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/lib/libfabric.so.1 (0x00007f3616ab1000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f361988f000)
##
##output of mpiexec.hydra -v:
##
 mpiexec.hydra -v -n 1 ./hello_mpi.ofi-internal-debugmt
[mpiexec@anchpcskx1001] Launch arguments: /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin//hydra_bstrap_proxy --upstream-host anchpcskx1001 --upstream-port 34685 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@anchpcskx1001] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_maxes
[proxy:0:0@anchpcskx1001] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_appnum
[proxy:0:0@anchpcskx1001] PMI response: cmd=appnum appnum=0
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@anchpcskx1001] PMI response: cmd=my_kvsname kvsname=kvs_67947_0
Abort(1094799) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(666)......: 
MPID_Init(922).............: 
MPIDI_NM_mpi_init_hook(719): OFI addrinfo() failed (ofi_init.h:719:MPIDI_NM_mpi_init_hook:No data available)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 67964 RUNNING AT anchpcskx1001
=   EXIT STATUS: 143
===================================================================================

##
## Output of mpirun -v
##

> mpirun -v -n 1 ./hello_mpi.ofi-internal-debugmt
[mpiexec@anchpcskx1001] Launch arguments: /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin//hydra_bstrap_proxy --upstream-host anchpcskx1001 --upstream-port 35059 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@anchpcskx1001] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_maxes
[proxy:0:0@anchpcskx1001] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_appnum
[proxy:0:0@anchpcskx1001] PMI response: cmd=appnum appnum=0
[proxy:0:0@anchpcskx1001] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@anchpcskx1001] PMI response: cmd=my_kvsname kvsname=kvs_67978_0
Abort(1094799) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(666)......: 
MPID_Init(922).............: 
MPIDI_NM_mpi_init_hook(719): OFI addrinfo() failed (ofi_init.h:719:MPIDI_NM_mpi_init_hook:No data available)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 67982 RUNNING AT anchpcskx1001
=   EXIT STATUS: 143
===================================================================================

##Contents of hello_mpi.c:
> cat hello_mpi.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d"
           " out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}

Your assistance will be much appreciated.

Best regards,

-Rashawn

 

 

Rashawn_K_Intel1
Employee
756 Views

UPDATE. I have been successful in compiling both the MPI hello world program and  PAPI using GCC 9.1.0 as the compiler suite (C, C++, and Fortran) and Intel 2019, update 4,  MPI. First, I instantiated the GCC environment followed by sourcing the Intel mpivars.sh without arguments. Then I reviewed the environment and noted two variables that were set: 1.) LIBRARY_PATH pointing to the directory containing libfabrics.so and 2.) FI_PROVIDER_PATH pointing to a directory containing the FI providers (sockets, tcp, psmx2, verbs, rxm). The LD_LIBRARY_PATH, PATH, and MANPATH had been updated appropriately. With these settings, one is able to compile an MPI code: mpicc <mpisrc.c> -o <mpiBinary>. But this will not execute on a single node with one or more processes. The complaint is:

mpirun -n 1 ./hello_mpi_mpivars-noargsAbort(1094799) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(666)......: 
MPID_Init(922).............: 
MPIDI_NM_mpi_init_hook(719): OFI addrinfo() failed (ofi_init.h:719:MPIDI_NM_mpi_init_hook:No data available)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 7541 RUNNING AT <hostname>
=   EXIT STATUS: 143
===================================================================================

However, when I set FI_PROVIDER=sockets at runtime, I obtain the expected output (it also works with tcp as the provider):

> mpirun -n 4 ./hello_mpi_mpivars-noargs
Hello world from processor <hostname>, rank 2 out of 4 processors
Hello world from processor <hostname>, rank 3 out of 4 processors
Hello world from processor <hostname>, rank 1 out of 4 processors
Hello world from processor <hostname>, rank 0 out of 4 processors

I then tackled the compilation of PAPI using the same environment. It compiled successfully, and  I was able to successfully the PAPI tests I needed to complete.

I definitely had something amiss in my environment  last week when I encountered the error during the PAPI  configure step stating that it could not create C executables and the log file indicated libfabric.so was not found.

I am happy with using the steps above for process communication via sockets or tcp fabric providers.

Thank you  Anatoliy and Maksim for your helpful responses.

Regards,

-Rashawn

Reply