How to tell what I_MPI_ADJUST are set to with Intel MPI 19

Matt_Thompson · ‎11-20-2019

Is there a way with Intel MPI 19 to see what the I_MPI_ADJUST_* values are set to?

With Intel 18.0.5, I see a lot like:

[0] MPI startup(): Gather: 3: 3073-16397 & 129-2147483647
[0] MPI startup(): Gather: 2: 16398-65435 & 129-2147483647
[0] MPI startup(): Gather: 3: 0-2147483647 & 129-2147483647
[0] MPI startup(): Gatherv: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-0 & 0-8
[0] MPI startup(): Reduce_scatter: 1: 1-16 & 0-8

On my cluster which admittedly only has Intel 19.0.2 at the moment installed, I tried running various codes with Intel MPI 19.0.2 and I_MPI_DEBUG set from 1 to 1000 and...not much. For example, when running a hello world:

(1189)(master) $ mpiifort -V
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.2.187 Build 20190117
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.

(1190)(master) $ mpirun -V
Intel(R) MPI Library for Linux* OS, Version 2019 Update 2 Build 20190123 (id: e2d820d49)
Copyright 2003-2019, Intel Corporation.
(1191)(master) $ mpirun -genv I_MPI_DEBUG=1000 -np 4 ./helloWorld.mpi3.hybrid.IMPI19.exe
[0] MPI startup(): libfabric version: 1.7.0a1-impi
[0] MPI startup(): libfabric provider: psm2
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       126170   borga065   {0,1,2,3,4,5,6,7,8,9}
[0] MPI startup(): 1       126171   borga065   {10,11,12,13,14,15,16,17,18,19}
[0] MPI startup(): 2       126172   borga065   {20,21,22,23,24,25,26,27,28,29}
[0] MPI startup(): 3       126173   borga065   {30,31,32,33,34,35,36,37,38,39}
Hello from thread    0 out of    1 on process    1 of    4 on processor borga065
Hello from thread    0 out of    1 on process    2 of    4 on processor borga065
Hello from thread    0 out of    1 on process    3 of    4 on processor borga065
Hello from thread    0 out of    1 on process    0 of    4 on processor borga065

Honestly I'm used to I_MPI_DEBUG being *very* verbose, but I guess not anymore? Is there another value I need to set?

Thanks,

Matt

SriVardham_A_Intel · ‎11-20-2019

Hi Matt,

Thanks for reaching out to us

We've observed the same change while running with 2019 MPI (even with beta:03 update). We'll connect with the concerned team and get back to you soon with updates regarding this

Regards,

Teja Alaghari

James_T_Intel · ‎12-20-2019

The internal infrastructure has become more complex with the 2019 version, and we no longer display algorithm selection information. Is there a specific problem you are trying to debug or is this a more general question regarding the provided information?

drMikeT · ‎12-20-2019

Hey James,

So far performance of IntelMPI 2919u5 over Mellanox IB is horribly bad. Does Intel plan to rectify this? We could see with Intel 2018 MPI performance on par with optimized OpenMPI (UCX+HCOLL) or low level ib_*_bw benchmarks. Since Mellanox h/w is a major part of the hpc fabrics world, I think that this would only force people to look for other MPI stacks.

Actually Intel could actively participate in the UCX forum that implements the low level s/w transports for p-2-p and 1-sided communications and can leverage all h/w accelerators in Mellanox h/w.

Can we use our own UCX installation and let IntelMPI use it when we ask for FI_PROVIDER=mlx ?That would be ideal.

thanks!

Michael

James_T_Intel · ‎01-06-2020

Using FI_PROVIDER=mlx with Intel® MPI Library 2019 Update 5 or later should utilize the system UCX. This does require at least UCX 1.5.

Matt_Thompson · ‎04-16-2020

James T. (Intel) wrote:
The internal infrastructure has become more complex with the 2019 version, and we no longer display algorithm selection information. Is there a specific problem you are trying to debug or is this a more general question regarding the provided information?

James,

We have now encountered what I believe are two different issues where the ADJUST has helped. The first was an issue where our model would crash in an odd place. We started compiling with debugging symbols and traceback on in lower and lower level libraries until we saw it was dying in an AllReduce call.

So we iterated over every single I_MPI_ADJUST_ALLREDUCE value and found that values 1 and 3-12 all worked. Only one didn't: 2. So we figured algorithm 2 must be the bad one for our set up.

And just today a user hit an issue with:

borga033.217914Exhausted 1048576 MQ irecv request descriptors, which usually indicates a user program error or insufficient request descriptors (PSM2_MQ_RECVREQS_MAX=1048576)

A search around the web leads to:

https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/731504

and if we go to our code where it died for him, it is in a GatherV heavy area. I'm having him try I_MPI_ADJUST_GATHERV=3 to see if it fixes it. He came back and asked what's the default value...and I can't answer that.

Does Intel at least have a list of "default" values? Or is it so automatic now that even that doesn't have meaning (aka, systems might have different defaults)?

James_T_Intel · ‎04-29-2020

The default algorithms are based on multiple factors (e.g. CPU, interconnect, job layout), there is no single list of defaults.

James_T_Intel · ‎08-04-2020

I am marking this thread as resolved for Intel support. Any further replies on this thread will be considered community only. If you need additional Intel support on this issue, please start a new thread.