Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Michailpg
Novice
508 Views

Intel MPI_Alltoallw Poor Performance

Hello,

We are developing a project that uses MPI for distributed execution. We need executables both for Windows and Linux. In windows we were using Microsoft MPI and decided to switch to Intel's implementation. Unfortunately, we saw a drop in performance on some specific cases. After some investigation we found that the problem is located to MPI_Alltoallw().

After searching, I found that Intel's MPI_Alltoallw() is a naive implementation of Isend/Irecv and has no alternatives for tuning like other collectives. 

To prove our results, I have created a C++ demo program using alltoallw(). Every processor contains a buffer glob_rows * cols and sends common_rows * cols to the others. That means that every processor at the end will have a common_rows * cols * comm_size buffer filled. Common_rows are calculated using cyclic_block distribution.

I compiled and ran both with Intel MPI 2019.7.216 and Microsoft MPI. The execution times on intel i5 4460 are:

#pr   msmpi     impi
2       1.69s       4.01s
4       3.27s       7.15s

I know that the specific demo can be solved using alltoallv or maybe even alltoall. The problem is that we use alltoallw a lot and it's performance is really important.

We didnt expect Intel's MPI to be slower than MSMPI. Do you have any tips? Is there any chance for MPI developers to improve Alltoallw()?

Thank you in advance guys!

For some reason I cannot upload my cpp file, this is the code

#include <iostream>
#include <random>
#include <mpi.h>
#include <memory>
#include <algorithm>


int main(int argc, char* argv[]){
  MPI_Init(&argc,&argv);

  int comm_size, comm_rank;
  MPI_Comm_size(MPI_COMM_WORLD,&comm_size);
  MPI_Comm_rank(MPI_COMM_WORLD,&comm_rank);

  // allocate initial buffer and fill it with random doubles
  int rows = 1 << 20;
  int cols = 50;
  size_t size = (size_t)rows * (size_t)cols;
  auto val = std::make_unique<double[]>(size);

  std::uniform_real_distribution<double> unif;
  std::default_random_engine re;
  std::generate(val.get(), val.get()+size, [&](){return unif(re);});

  // calculate common rows that each processor will have
  int total_blks = rows / 64;
  int cm_blks = total_blks / comm_size;
  int cm_rows = cm_blks * 64;

  // final buffer for each processor. Each processor will receive cm_rows * cols
  int cols_f = cols * comm_size;
  auto b_val = std::make_unique<double[]>(cm_rows * cols_f);

  // Create datatypes
  MPI_Datatype scol,scol_res,sblock,rblock;
  MPI_Type_vector(cm_blks,64,64*comm_size,MPI_DOUBLE,&scol);
  MPI_Type_create_resized(scol,0,rows*sizeof(double),&scol_res);
  MPI_Type_contiguous(cols,scol_res,&sblock);
  MPI_Type_contiguous(cm_rows*cols,MPI_DOUBLE,&rblock);
  MPI_Type_commit(&sblock);
  MPI_Type_commit(&rblock);

  std::vector<int> scounts(comm_size,1);
  std::vector<int> rcounts(comm_size,1);
  std::vector<int> sdispls(comm_size);
  std::vector<int> rdispls(comm_size);
  std::vector<MPI_Datatype> stypes(comm_size);
  std::vector<MPI_Datatype> rtypes(comm_size);

  for (int i=0;i<comm_size;i++){
    sdispls[i] = 64*i*sizeof(double);
    rdispls[i] = cm_rows*cols*i*sizeof(double);
    stypes[i] = sblock;
    rtypes[i] = rblock;
  }

  MPI_Barrier(MPI_COMM_WORLD);
  double str = MPI_Wtime();
  for (int i=0;i<10;i++) MPI_Alltoallw(val.get(),scounts.data(),sdispls.data(),stypes.data(),b_val.get(),rcounts.data(),rdispls.data(),rtypes.data(),MPI_COMM_WORLD);
  MPI_Barrier(MPI_COMM_WORLD);
  if (!comm_rank) printf("Time: %lf seconds\n",MPI_Wtime()-str);

  MPI_Finalize();
  return 0;
}

 

Labels (2)
0 Kudos
11 Replies
PrasanthD_intel
Moderator
496 Views

Hi Michail,


Thanks for connecting to us.

Yes, the AlltoAllw has only a Isend/Irecv waitall implementation in IMPI.

We also observed similar timings for the given program as you have reported for IMPI.

We are forwarding your query to the concerned team and will get back to you at earliest.


Regards

Prasanth


Michailpg
Novice
452 Views

Thank you!

445 Views

Hi Michail,


On how many nodes do you observe this behavior?


Best regards,

Amar


Michailpg
Novice
439 Views

Hi DrAmarpal,

 

We are currently running on SMP mode on 1 node. Soon we are going to use 4-8 nodes max.

 

Best regards,

Michail

421 Views

Hi Michail,


Thanks for confirming. Please hold on for a solution on this.


Best regards,

Amar


378 Views

Hi Michail,


Could you please rerun your experiments with Intel MPI Library 2019 U8 that was recently released? With this version please set FI_PROVIDER=netdir and report your findings.


Best regards,

Amar


Michailpg
Novice
354 Views

Hi DrAmarpal,

 

I downloaded Intel MPI Library 2019 U8 and compiled my code with it. Using

 

 

I_MPI_FABRICS=ofi
FI_PROVIDER=netdir

 

 

 I get an error which says 

 

 

[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 8  Build 20200624
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.10.1a1-impi
Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........:
MPID_Init(1138)..............:
MPIDI_OFI_mpi_init_hook(1061): OFI addrinfo() failed (netmod\ofi\ofi_init.c:1061:MPIDI_OFI_mpi_init_hook:Unknown error)

 

 

 If I use FI_PROVIDER=tcp, it works but the execution time is still large.  With FI_PROVIDER=shm the execution time is quite better but not what we want compared to MS_MPI.

I am runing on a single node. The changes were focused for inter-node communication?

Best Regards,

Jason

337 Views

Hi Jason,


Thanks for reporting your findings. To understand what the problem is, could you please source the debug version of the Intel MPI library by running,

mpivars.bat debug


and set FI_LOG_LEVEL=debug  before running your test. Please share the additional output that gets generated during this run.


Best regards,

Amar


Michailpg
Novice
332 Views

Dear Amar,

I followed your instructions. Running with 2 or 4 processors, the output is pretty short.

Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........:
MPID_Init(1138)..............:
MPIDI_OFI_mpi_init_hook(1061): OFI addrinfo() failed (netmod\ofi\ofi_init.c:1061:MPIDI_OFI_mpi_init_hook:Unknown error)

Using 1 processor I get the following,

Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........:
MPID_Init(1138)..............:
MPIDI_OFI_mpi_init_hook(1061): OFI addrinfo() failed (netmod\ofi\ofi_init.c:1061:MPIDI_OFI_mpi_init_hook:Unknown error)
libfabric:476:core:mr:ofi_default_cache_size():56<info> default cache size=0
libfabric:476:netdir:core:ofi_nd_startup():602<info> ofi_nd_startup: starting initialization
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: netdir (110.10)
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: ofi_rxm (110.10)
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: sockets (110.10)
libfabric:476:core:core:ofi_register_provider():446<info> "sockets" filtered by provider include/exclude list, skipping
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: tcp (110.10)
libfabric:476:core:core:ofi_register_provider():446<info> "tcp" filtered by provider include/exclude list, skipping
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_perf (110.10)
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_noop (110.10)
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by ofi_rxm provider
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider

 

Using 4 processors and TCP I get,

libfabric:7532:core:mr:ofi_default_cache_size():56<info> default cache size=0
libfabric:7532:netdir:core:ofi_nd_startup():602<info> ofi_nd_startup: starting initialization
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: netdir (110.10)
libfabric:7532:core:core:ofi_register_provider():446<info> "netdir" filtered by provider include/exclude list, skipping
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: ofi_rxm (110.10)
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: sockets (110.10)
libfabric:7532:core:core:ofi_register_provider():446<info> "sockets" filtered by provider include/exclude list, skipping
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: tcp (110.10)
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_perf (110.10)
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_noop (110.10)
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_libfabric:2492:core:mr:ofi_default_cache_size():56<info> default cache size=0
libfabric:2492:netdir:core:ofi_nd_startup():602<info> ofi_nd_startup: starting initialization
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: netdir (110.10)
libfabric:2492:core:core:ofi_register_provider():446<info> "netdir" filtered by provider include/exclude list, skipping
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: ofi_rxm (110.10)
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: sockets (110.10)
libfabric:2492:core:core:ofi_register_provider():446<info> "sockets" filtered by provider include/exclude list, skipping
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: tcp (110.10)
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_perf (110.10)
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_noop (110.10)
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:fi_getinfo():1051<warn> Can't find provider with the highest priority
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:tcp:core:ofi_check_ep_type():654<info> unsupported endpoint type
libfabric:2492:tcp:core:ofi_check_ep_type():655<info> Supported: FI_EP_MSG
libfabric:2492:tcp:core:ofi_check_ep_type():655<info> Requested: FI_EP_RDM
libfabric:2492:core:core:fi_getinfo():1129<info> Now it is being used by tcp provider
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: :addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:fi_getinfo():1051<warn> Can't find provider with the highest priority
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:tcp:core:ofi_check_ep_type():654<info> unsupported endpoint type
libfabric:7532:tcp:core:ofi_check_ep_type():655<info> Supported: FI_EP_MSG
libfabric:7532:tcp:core:ofi_check_ep_type():655<info> Requested: FI_EP_RDM
libfabric:7532:core:core:fi_getinfo():1129<info> Now it is being used by tcp provider
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: :Time: 21.775112 seconds
 fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_fabric():1346<info> Opened fabric: 10.0.2.0/24
libfabric:2492:core:core:fi_fabric():1346<info> Opened fabric: 10.0.2.0/24
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:av:util_verify_av_attr():474<warn> Shared AV is unsupported
libfabric:2492:ofi_rxm:av:util_av_init():446<info> AV size 1024
libfabric:2492:ofi_rxm:core:ofi_check_fabric_attr():403<info> Requesting provider verbs, skipping tcp;ofi_rxm
libfabric:2492:ofi_rxm:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:ofi_rxm:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:ofi_rxm:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:core:ofi_check_ep_attr():766<info> Tag size exceeds supported size
libfabric:2492:ofi_rxm:core:ofi_check_ep_attr():767<info> Supported: 6148914691236517205
libfabric:2492:ofi_rxm:core:ofi_check_ep_attr():767<info> Requested: -6148914691236517206
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:core:rxm_ep_settings_init():2440<info> Settings:
                 MR local: MSG - 0, RxM - 0
                 Completions per progress: MSG - 1
                 Buffered min: 0
                 Min multi recv size: 16320
                 FI_EP_MSG provider inject size: 64
                 rxm inject size: 16320
                 Protocol limits: Eager: 16320, SAR: 131072
libfabric:2492:ofi_rxm:core:rxm_ep_setopt():587<info> FI_OPT_MIN_MULTI_RECV set to 16384
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:ep_ctrl:rxm_cmap_free():684<info> Closing cmap
libfabric:2492:ofi_rxm:ep_ctrl:rxm_cmap_cm_thread_close():658<info> stopping CM thread
libfabric:2492:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (568) not found in wait list - 00000000000C8210
libfabric:2492:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (560) not found in wait list - 00000000000C8210
libfabric:2492:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (564) not found in wait list - 00000000000C8210
 fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_fabric():1346<info> Opened fabric: 10.0.2.0/24
libfabric:7532:core:core:fi_fabric():1346<info> Opened fabric: 10.0.2.0/24
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:av:util_verify_av_attr():474<warn> Shared AV is unsupported
libfabric:7532:ofi_rxm:av:util_av_init():446<info> AV size 1024
libfabric:7532:ofi_rxm:core:ofi_check_fabric_attr():403<info> Requesting provider verbs, skipping tcp;ofi_rxm
libfabric:7532:ofi_rxm:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:ofi_rxm:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:ofi_rxm:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:core:ofi_check_ep_attr():766<info> Tag size exceeds supported size
libfabric:7532:ofi_rxm:core:ofi_check_ep_attr():767<info> Supported: 6148914691236517205
libfabric:7532:ofi_rxm:core:ofi_check_ep_attr():767<info> Requested: -6148914691236517206
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:core:rxm_ep_settings_init():2440<info> Settings:
                 MR local: MSG - 0, RxM - 0
                 Completions per progress: MSG - 1
                 Buffered min: 0
                 Min multi recv size: 16320
                 FI_EP_MSG provider inject size: 64
                 rxm inject size: 16320
                 Protocol limits: Eager: 16320, SAR: 131072
libfabric:7532:ofi_rxm:core:rxm_ep_setopt():587<info> FI_OPT_MIN_MULTI_RECV set to 16384
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:ep_ctrl:rxm_cmap_free():684<info> Closing cmap
libfabric:7532:ofi_rxm:ep_ctrl:rxm_cmap_cm_thread_close():658<info> stopping CM thread
libfabric:7532:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (564) not found in wait list - 00000000001779C0
libfabric:7532:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (580) not found in wait list - 00000000001779C0
libfabric:7532:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (576) not found in wait list - 00000000001779C0

 

Thank you for your help

 

Best Regards,

Jason

295 Views

Hi Jason,


Thanks for reporting your findings. Which NIC card do you have on your system? If you are using IB cards, how is IPoIB configured (v4/v6/both)?


Many thanks,

Amar


Michailpg
Novice
284 Views

Dear Amar,

I have the following NIC,

description: Ethernet interface
       product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
       vendor: Realtek Semiconductor Co., Ltd.
       physical id: 0
       bus info: pci@0000:03:00.0
       logical name: enp3s0
       version: 0c
       serial: 1c:1b:0d:7c:44:9e
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress msix vpd bus_master cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=2.3LK-NAPI duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=10.0.0.6 latency=0 link=yes multicast=yes port=MII speed=1Gbit/s

It has a static IPv4.

We do not have IB card as we dont run in multiple nodes yet.

Best Regards,
Jason

Tags (1)