Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2161 Discussions

Intel MPI_Alltoallw Poor Performance

Michailpg
Novice
4,419 Views

Hello,

We are developing a project that uses MPI for distributed execution. We need executables both for Windows and Linux. In windows we were using Microsoft MPI and decided to switch to Intel's implementation. Unfortunately, we saw a drop in performance on some specific cases. After some investigation we found that the problem is located to MPI_Alltoallw().

After searching, I found that Intel's MPI_Alltoallw() is a naive implementation of Isend/Irecv and has no alternatives for tuning like other collectives. 

To prove our results, I have created a C++ demo program using alltoallw(). Every processor contains a buffer glob_rows * cols and sends common_rows * cols to the others. That means that every processor at the end will have a common_rows * cols * comm_size buffer filled. Common_rows are calculated using cyclic_block distribution.

I compiled and ran both with Intel MPI 2019.7.216 and Microsoft MPI. The execution times on intel i5 4460 are:

#pr   msmpi     impi
2       1.69s       4.01s
4       3.27s       7.15s

I know that the specific demo can be solved using alltoallv or maybe even alltoall. The problem is that we use alltoallw a lot and it's performance is really important.

We didnt expect Intel's MPI to be slower than MSMPI. Do you have any tips? Is there any chance for MPI developers to improve Alltoallw()?

Thank you in advance guys!

For some reason I cannot upload my cpp file, this is the code

#include <iostream>
#include <random>
#include <mpi.h>
#include <memory>
#include <algorithm>


int main(int argc, char* argv[]){
  MPI_Init(&argc,&argv);

  int comm_size, comm_rank;
  MPI_Comm_size(MPI_COMM_WORLD,&comm_size);
  MPI_Comm_rank(MPI_COMM_WORLD,&comm_rank);

  // allocate initial buffer and fill it with random doubles
  int rows = 1 << 20;
  int cols = 50;
  size_t size = (size_t)rows * (size_t)cols;
  auto val = std::make_unique<double[]>(size);

  std::uniform_real_distribution<double> unif;
  std::default_random_engine re;
  std::generate(val.get(), val.get()+size, [&](){return unif(re);});

  // calculate common rows that each processor will have
  int total_blks = rows / 64;
  int cm_blks = total_blks / comm_size;
  int cm_rows = cm_blks * 64;

  // final buffer for each processor. Each processor will receive cm_rows * cols
  int cols_f = cols * comm_size;
  auto b_val = std::make_unique<double[]>(cm_rows * cols_f);

  // Create datatypes
  MPI_Datatype scol,scol_res,sblock,rblock;
  MPI_Type_vector(cm_blks,64,64*comm_size,MPI_DOUBLE,&scol);
  MPI_Type_create_resized(scol,0,rows*sizeof(double),&scol_res);
  MPI_Type_contiguous(cols,scol_res,&sblock);
  MPI_Type_contiguous(cm_rows*cols,MPI_DOUBLE,&rblock);
  MPI_Type_commit(&sblock);
  MPI_Type_commit(&rblock);

  std::vector<int> scounts(comm_size,1);
  std::vector<int> rcounts(comm_size,1);
  std::vector<int> sdispls(comm_size);
  std::vector<int> rdispls(comm_size);
  std::vector<MPI_Datatype> stypes(comm_size);
  std::vector<MPI_Datatype> rtypes(comm_size);

  for (int i=0;i<comm_size;i++){
    sdispls[i] = 64*i*sizeof(double);
    rdispls[i] = cm_rows*cols*i*sizeof(double);
    stypes[i] = sblock;
    rtypes[i] = rblock;
  }

  MPI_Barrier(MPI_COMM_WORLD);
  double str = MPI_Wtime();
  for (int i=0;i<10;i++) MPI_Alltoallw(val.get(),scounts.data(),sdispls.data(),stypes.data(),b_val.get(),rcounts.data(),rdispls.data(),rtypes.data(),MPI_COMM_WORLD);
  MPI_Barrier(MPI_COMM_WORLD);
  if (!comm_rank) printf("Time: %lf seconds\n",MPI_Wtime()-str);

  MPI_Finalize();
  return 0;
}

 

Labels (2)
0 Kudos
17 Replies
PrasanthD_intel
Moderator
4,407 Views

Hi Michail,


Thanks for connecting to us.

Yes, the AlltoAllw has only a Isend/Irecv waitall implementation in IMPI.

We also observed similar timings for the given program as you have reported for IMPI.

We are forwarding your query to the concerned team and will get back to you at earliest.


Regards

Prasanth


Michailpg
Novice
4,363 Views
0 Kudos
DrAmarpal_K_Intel
4,356 Views

Hi Michail,


On how many nodes do you observe this behavior?


Best regards,

Amar


0 Kudos
Michailpg
Novice
4,350 Views

Hi DrAmarpal,

 

We are currently running on SMP mode on 1 node. Soon we are going to use 4-8 nodes max.

 

Best regards,

Michail

0 Kudos
DrAmarpal_K_Intel
4,332 Views

Hi Michail,


Thanks for confirming. Please hold on for a solution on this.


Best regards,

Amar


0 Kudos
DrAmarpal_K_Intel
4,289 Views

Hi Michail,


Could you please rerun your experiments with Intel MPI Library 2019 U8 that was recently released? With this version please set FI_PROVIDER=netdir and report your findings.


Best regards,

Amar


0 Kudos
Michailpg
Novice
4,265 Views

Hi DrAmarpal,

 

I downloaded Intel MPI Library 2019 U8 and compiled my code with it. Using

 

 

I_MPI_FABRICS=ofi
FI_PROVIDER=netdir

 

 

 I get an error which says 

 

 

[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 8  Build 20200624
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.10.1a1-impi
Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........:
MPID_Init(1138)..............:
MPIDI_OFI_mpi_init_hook(1061): OFI addrinfo() failed (netmod\ofi\ofi_init.c:1061:MPIDI_OFI_mpi_init_hook:Unknown error)

 

 

 If I use FI_PROVIDER=tcp, it works but the execution time is still large.  With FI_PROVIDER=shm the execution time is quite better but not what we want compared to MS_MPI.

I am runing on a single node. The changes were focused for inter-node communication?

Best Regards,

Jason

0 Kudos
DrAmarpal_K_Intel
4,248 Views

Hi Jason,


Thanks for reporting your findings. To understand what the problem is, could you please source the debug version of the Intel MPI library by running,

mpivars.bat debug


and set FI_LOG_LEVEL=debug  before running your test. Please share the additional output that gets generated during this run.


Best regards,

Amar


0 Kudos
Michailpg
Novice
4,243 Views

Dear Amar,

I followed your instructions. Running with 2 or 4 processors, the output is pretty short.

Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........:
MPID_Init(1138)..............:
MPIDI_OFI_mpi_init_hook(1061): OFI addrinfo() failed (netmod\ofi\ofi_init.c:1061:MPIDI_OFI_mpi_init_hook:Unknown error)

Using 1 processor I get the following,

Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........:
MPID_Init(1138)..............:
MPIDI_OFI_mpi_init_hook(1061): OFI addrinfo() failed (netmod\ofi\ofi_init.c:1061:MPIDI_OFI_mpi_init_hook:Unknown error)
libfabric:476:core:mr:ofi_default_cache_size():56<info> default cache size=0
libfabric:476:netdir:core:ofi_nd_startup():602<info> ofi_nd_startup: starting initialization
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: netdir (110.10)
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: ofi_rxm (110.10)
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: sockets (110.10)
libfabric:476:core:core:ofi_register_provider():446<info> "sockets" filtered by provider include/exclude list, skipping
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: tcp (110.10)
libfabric:476:core:core:ofi_register_provider():446<info> "tcp" filtered by provider include/exclude list, skipping
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_perf (110.10)
libfabric:476:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_noop (110.10)
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider
libfabric:476:core:core:fi_getinfo():1066<info> Found provider with the highest priority netdir, must_use_util_prov = 1
libfabric:476:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by ofi_rxm provider
libfabric:476:core:core:fi_getinfo():1129<info> Now it is being used by netdir provider

 

Using 4 processors and TCP I get,

libfabric:7532:core:mr:ofi_default_cache_size():56<info> default cache size=0
libfabric:7532:netdir:core:ofi_nd_startup():602<info> ofi_nd_startup: starting initialization
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: netdir (110.10)
libfabric:7532:core:core:ofi_register_provider():446<info> "netdir" filtered by provider include/exclude list, skipping
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: ofi_rxm (110.10)
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: sockets (110.10)
libfabric:7532:core:core:ofi_register_provider():446<info> "sockets" filtered by provider include/exclude list, skipping
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: tcp (110.10)
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_perf (110.10)
libfabric:7532:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_noop (110.10)
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_libfabric:2492:core:mr:ofi_default_cache_size():56<info> default cache size=0
libfabric:2492:netdir:core:ofi_nd_startup():602<info> ofi_nd_startup: starting initialization
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: netdir (110.10)
libfabric:2492:core:core:ofi_register_provider():446<info> "netdir" filtered by provider include/exclude list, skipping
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: ofi_rxm (110.10)
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: sockets (110.10)
libfabric:2492:core:core:ofi_register_provider():446<info> "sockets" filtered by provider include/exclude list, skipping
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: tcp (110.10)
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_perf (110.10)
libfabric:2492:core:core:ofi_register_provider():418<info> registering provider: ofi_hook_noop (110.10)
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:fi_getinfo():1051<warn> Can't find provider with the highest priority
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:tcp:core:ofi_check_ep_type():654<info> unsupported endpoint type
libfabric:2492:tcp:core:ofi_check_ep_type():655<info> Supported: FI_EP_MSG
libfabric:2492:tcp:core:ofi_check_ep_type():655<info> Requested: FI_EP_RDM
libfabric:2492:core:core:fi_getinfo():1129<info> Now it is being used by tcp provider
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: :addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:fi_getinfo():1051<warn> Can't find provider with the highest priority
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:core:core:ofi_layering_ok():915<info> Need core provider, skipping ofi_rxm
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:tcp:core:ofi_check_ep_type():654<info> unsupported endpoint type
libfabric:7532:tcp:core:ofi_check_ep_type():655<info> Supported: FI_EP_MSG
libfabric:7532:tcp:core:ofi_check_ep_type():655<info> Requested: FI_EP_RDM
libfabric:7532:core:core:fi_getinfo():1129<info> Now it is being used by tcp provider
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: :Time: 21.775112 seconds
 fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:2492:core:core:fi_fabric():1346<info> Opened fabric: 10.0.2.0/24
libfabric:2492:core:core:fi_fabric():1346<info> Opened fabric: 10.0.2.0/24
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:av:util_verify_av_attr():474<warn> Shared AV is unsupported
libfabric:2492:ofi_rxm:av:util_av_init():446<info> AV size 1024
libfabric:2492:ofi_rxm:core:ofi_check_fabric_attr():403<info> Requesting provider verbs, skipping tcp;ofi_rxm
libfabric:2492:ofi_rxm:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:ofi_rxm:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:ofi_rxm:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:core:ofi_check_ep_attr():766<info> Tag size exceeds supported size
libfabric:2492:ofi_rxm:core:ofi_check_ep_attr():767<info> Supported: 6148914691236517205
libfabric:2492:ofi_rxm:core:ofi_check_ep_attr():767<info> Requested: -6148914691236517206
libfabric:2492:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:2492:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:2492:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:core:rxm_ep_settings_init():2440<info> Settings:
                 MR local: MSG - 0, RxM - 0
                 Completions per progress: MSG - 1
                 Buffered min: 0
                 Min multi recv size: 16320
                 FI_EP_MSG provider inject size: 64
                 rxm inject size: 16320
                 Protocol limits: Eager: 16320, SAR: 131072
libfabric:2492:ofi_rxm:core:rxm_ep_setopt():587<info> FI_OPT_MIN_MULTI_RECV set to 16384
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:2492:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:2492:ofi_rxm:ep_ctrl:rxm_cmap_free():684<info> Closing cmap
libfabric:2492:ofi_rxm:ep_ctrl:rxm_cmap_cm_thread_close():658<info> stopping CM thread
libfabric:2492:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (568) not found in wait list - 00000000000C8210
libfabric:2492:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (560) not found in wait list - 00000000000C8210
libfabric:2492:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (564) not found in wait list - 00000000000C8210
 fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:util_getinfo_ifs():312<info> Chosen addr for using: 10.0.2.15, speed 1000000000
libfabric:7532:core:core:fi_fabric():1346<info> Opened fabric: 10.0.2.0/24
libfabric:7532:core:core:fi_fabric():1346<info> Opened fabric: 10.0.2.0/24
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:av:util_verify_av_attr():474<warn> Shared AV is unsupported
libfabric:7532:ofi_rxm:av:util_av_init():446<info> AV size 1024
libfabric:7532:ofi_rxm:core:ofi_check_fabric_attr():403<info> Requesting provider verbs, skipping tcp;ofi_rxm
libfabric:7532:ofi_rxm:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:ofi_rxm:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:ofi_rxm:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:core:ofi_check_ep_attr():766<info> Tag size exceeds supported size
libfabric:7532:ofi_rxm:core:ofi_check_ep_attr():767<info> Supported: 6148914691236517205
libfabric:7532:ofi_rxm:core:ofi_check_ep_attr():767<info> Requested: -6148914691236517206
libfabric:7532:core:core:fi_getinfo():1066<info> Found provider with the highest priority tcp, must_use_util_prov = 1
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: 10.0.2.15, iface name: eth1, speed: 1000000000
libfabric:7532:tcp:core:ofi_get_list_of_addr():1255<info> Available addr: fe80::6166:bdf8:dbc8:9a1, iface name: eth0, speed: 1000000000
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1100<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:7532:tcp:core:ofi_insert_loopback_addr():1114<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:core:rxm_ep_settings_init():2440<info> Settings:
                 MR local: MSG - 0, RxM - 0
                 Completions per progress: MSG - 1
                 Buffered min: 0
                 Min multi recv size: 16320
                 FI_EP_MSG provider inject size: 64
                 rxm inject size: 16320
                 Protocol limits: Eager: 16320, SAR: 131072
libfabric:7532:ofi_rxm:core:rxm_ep_setopt():587<info> FI_OPT_MIN_MULTI_RECV set to 16384
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:tcp:core:ofi_check_rx_attr():782<info> Tx only caps ignored in Rx caps
libfabric:7532:tcp:core:ofi_check_tx_attr():880<info> Rx only caps ignored in Tx caps
libfabric:7532:ofi_rxm:ep_ctrl:rxm_cmap_free():684<info> Closing cmap
libfabric:7532:ofi_rxm:ep_ctrl:rxm_cmap_cm_thread_close():658<info> stopping CM thread
libfabric:7532:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (564) not found in wait list - 00000000001779C0
libfabric:7532:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (580) not found in wait list - 00000000001779C0
libfabric:7532:tcp:fabric:ofi_wait_del_fd():220<info> Given fd (576) not found in wait list - 00000000001779C0

 

Thank you for your help

 

Best Regards,

Jason

0 Kudos
DrAmarpal_K_Intel
4,206 Views

Hi Jason,


Thanks for reporting your findings. Which NIC card do you have on your system? If you are using IB cards, how is IPoIB configured (v4/v6/both)?


Many thanks,

Amar


0 Kudos
Michailpg
Novice
4,195 Views

Dear Amar,

I have the following NIC,

description: Ethernet interface
       product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
       vendor: Realtek Semiconductor Co., Ltd.
       physical id: 0
       bus info: pci@0000:03:00.0
       logical name: enp3s0
       version: 0c
       serial: 1c:1b:0d:7c:44:9e
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress msix vpd bus_master cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=2.3LK-NAPI duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=10.0.0.6 latency=0 link=yes multicast=yes port=MII speed=1Gbit/s

It has a static IPv4.

We do not have IB card as we dont run in multiple nodes yet.

Best Regards,
Jason

0 Kudos
DrAmarpal_K_Intel
3,733 Views

Hi Jason,


Apologies for the radio silence on this thread. I just wanted to let you know that an internal ticket has been raised for this issue with the development team. I shall write to you with more details as they become available.


Best regards,

Amar



SVDB
Beginner
3,677 Views

On our cluster we are testing an upgrade of our Intel MPI Library to Version 2021.2 and we observe something similar as the original post. Specifically for MPI_Alltoallw, the performance is significantly worse than for previous Intel MPI versions. In an attempt to simplify the code, I made a single-core program that performs a matrix transpose by constructing a strided MPI data type that allows to change between row-major and column-major storage. For this case, it is possible to use MPI_Alltoall (or even a simple Fortran transpose), but in our actual code the use of MPI_Alltoallw is required.


Here are the timings (in seconds) for transposing a [512x512x512] array along the first two dimensions on a Intel(R) Xeon(R) Gold 6140:

  TRANSPOSE ALLTOALL ALLTOALLW
Version 2018 Update 5 1.29 1.50 1.30
Version 2021.2 1.28 1.49 2.12

 

A first interesting observation is that ALLTOALL is only significantly slower than TRANSPOSE if the strided MPI Data type is at the receiving side. If the sender has the strided MPI Data type, the difference is only a few percent.

The more important issue for us is the serious slow down (timing increase of more than 50%) of ALLTOALLW when switching to the new Intel MPI library.
I added the code to get these numbers in attachment. It can be simply compiled with "mpiifort -O2 -xHost bench.f90" and run with "I_MPI_PIN_PROCESSOR_LIST=0 mpirun -np 1 ./a.out 512". Here is the output when setting I_MPI_DEBUG=12 for the latest version:

 

[0] MPI startup(): Intel(R) MPI Library, Version 2021.2  Build 20210302 (id: f4f7c92cd)
[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (2084 MB per rank) * (1 local ranks) = 2084 MB total
[0] MPI startup(): libfabric version: 1.11.0-impi
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1
[0] MPI startup(): addrnamelen: 1024
[0] MPI startup(): File "/vsc-hard-mounts/leuven-apps/skylake/2021a/software/impi/2021.2.0-intel-compilers-2021.2.0/mpi/2021.2.0/etc/tuning_skx_shm-ofi_mlx.dat" not found
[0] MPI startup(): Load tuning file: "/vsc-hard-mounts/leuven-apps/skylake/2021a/software/impi/2021.2.0-intel-compilers-2021.2.0/mpi/2021.2.0/etc/tuning_skx_shm-ofi.dat"
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       28548    r22i13n16  0
[0] MPI startup(): I_MPI_ROOT=/vsc-hard-mounts/leuven-apps/skylake/2021a/software/impi/2021.2.0-intel-compilers-2021.2.0/mpi/2021.2.0
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_RMK=pbs
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_PIN_PROCESSOR_LIST=0
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=12
0 Kudos
DrAmarpal_K_Intel
3,659 Views

Hi SVDB,


Although similar, the original issue in this thread relates primarily to Windows and not Linux (like in your case). For efficient tracking, may I request you to kindly open a new thread?


Many thanks,

Amar



0 Kudos
DrAmarpal_K_Intel
3,404 Views

Dear community members,


Please be informed that we are working on fixing this issue in a future release of Intel MPI Library.


0 Kudos
DrAmarpal_K_Intel
1,477 Views

Hello again, Michail,


Can you please recheck the performance with the latest version of Intel MPI? The performance has improved significantly versus the older version.


Best regards,

Amar


0 Kudos
DrAmarpal_K_Intel
1,396 Views

Closing this case due to inactivity. This issue is assumed to be resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.


0 Kudos
Reply