Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

problems using PSM2 "Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023 "

silvio_stanzani
Beginner
1,062 Views

hello,

I installed the Omni-path driver ( IntelOPA-Basic.RHEL74-x86_64.10.6.1.0.2.tgz ) on two identical KNL/F servers with Centos  ( CentOS Linux release 7.4.1708 (Core) )

I executed the MPI Benchmark provided by intel using PSM2:

mpirun -PSM2 -host 10.0.0.5 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.6 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv

And the execution return the following error:

[silvio@phi05 ~]$ mpirun -PSM2 -host 10.0.0.5 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.6 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv
init_provider_list: using configuration file: /opt/intel/compilers_and_libraries_2018.1.163/linux/mpi/intel64/etc/tmi.conf
init_provider_list: valid configuration line: psm2 1.3 libtmip_psm2.so " "
init_provider_list: using configuration file: /opt/intel/compilers_and_libraries_2018.1.163/linux/mpi/intel64/etc/tmi.conf
init_provider_list: valid configuration line: psm 1.2 libtmip_psm.so " "
init_provider_list: valid configuration line: mx 1.0 libtmip_mx.so " "
init_provider_list: valid configuration line: psm2 1.3 libtmip_psm2.so " "
init_provider_list: valid configuration line: psm 1.2 libtmip_psm.so " "
init_provider_list: valid configuration line: mx 1.0 libtmip_mx.so " "
tmi_psm2_init: tmi_psm2_connect_timeout=180
init_provider_lib: using provider: psm2, version 1.3
tmi_psm2_init: tmi_psm2_connect_timeout=180
init_provider_lib: using provider: psm2, version 1.3
phi05.11971 Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023 


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 11971 RUNNING AT 10.0.0.5
=   EXIT CODE: 134
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 11971 RUNNING AT 10.0.0.5
=   EXIT CODE: 6
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
   Intel(R) MPI Library troubleshooting guide:
      https://software.intel.com/node/561764
===================================================================================

I search this message on google (Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023 ) and the only reference is the following source code:

https://github.com/01org/opa-psm2/blob/master/ptl_ips/ips_proto_connect.c

How do i put the two fabrics in the same subnet?

When i change to Infiniband it works:

mpirun -IB -host 10.0.0.5 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.6 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv

#-----------------------------------------------------------------------------

# Benchmarking Sendrecv 
# #processes = 2 
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        17.79        17.79        17.79         0.00
            1         1000        18.11        18.11        18.11         0.11
            2         1000        18.05        18.05        18.05         0.22
            4         1000        18.08        18.08        18.08         0.44
            8         1000        18.05        18.05        18.05         0.89
           16         1000        18.06        18.06        18.06         1.77
           32         1000        18.99        18.99        18.99         3.37
           64         1000        19.05        19.07        19.06         6.71
          128         1000        19.20        19.20        19.20        13.33
          256         1000        19.96        19.97        19.97        25.64
          512         1000        20.22        20.22        20.22        50.63
         1024         1000        20.38        20.39        20.39       100.44
         2048         1000        24.70        24.71        24.70       165.78
         4096         1000        25.98        25.98        25.98       315.31
         8192         1000        55.57        55.59        55.58       294.75
        16384         1000        61.89        61.90        61.90       529.33
        32768         1000       112.95       113.01       112.98       579.89
        65536          640       158.22       158.23       158.22       828.37
       131072          320       297.40       297.50       297.45       881.16
       262144          160       599.27       600.30       599.78       873.38
       524288           80     31394.80     31489.45     31442.13        33.30
      1048576           40     28356.10     28414.67     28385.39        73.81
      2097152           20     31387.65     31661.40     31524.53       132.47
      4194304           10     38455.80     40408.99     39432.39       207.59

 

0 Kudos
0 Replies
Reply