Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Configuration for Intel impi

L__D__Marks
New Contributor II
5,426 Views
On a supercomputer using slurm/srun I am seeing irreproducible crashes, some a sigsev in program A, sometime a bus error in program B. Both seem to be linked to mpi operation. These are large calculations using hybrid omp/mpi of 2omp x 128mpi as hybrid is more memory efficient. Intel impi. The crashes occur 5-10% of the time, and are not in the base code.
 
According to https://slurm.schedmd.com/mpi_guide.html I should use PMI2 with
I_MPI_PMI_LIBRARY=/path/to/slurm/lib/libpmi2.so .  (Currently I_MPI_PMI_LIBRARY is not set.) Apparently PMI1 is not very thread safe. Has anyone come across anything similar?
Labels (1)
0 Kudos
17 Replies
VeenaJ_Intel
Moderator
5,393 Views

Hi,

 

Thanks for posting in Intel communities!

 

To assist you more effectively, could you kindly provide the following details:

 

Operating System (OS) Details

Intel MPI version

Output of the "lscpu" command

Hardware Details

Detailed Steps for Recreating the Scenario

Interconnect Details

 

Your cooperation in furnishing this information will greatly aid in addressing your concerns. Thank you in advance!

 

Regards,

Veena

 

0 Kudos
L__D__Marks
New Contributor II
5,380 Views
Sorry, but please read my posting. I was asking about PMI1 versus PMI2 with impi. Your response is not relevant.
0 Kudos
VeenaJ_Intel
Moderator
5,334 Views

Hi,

 

Sorry for the inconvenience caused.

 

Intel® MPI currently supports only PMI-1 and PMI-2, without support for PMIx. For optimal scalability, it is strongly recommended to configure this MPI implementation to use Slurm's PMI-2, as it offers superior scalability compared to PMI-1. While PMI-1 is still available, it is advised to transition to PMI-2, considering that PMI-1 may be deprecated in the near future. Your consideration of this recommendation is highly appreciated.

 

Regards,

Veena

 

0 Kudos
L__D__Marks
New Contributor II
5,318 Views

Thankyou for the response. It is somewhat an answer, but there are some points you do not mention. A key one is that the SLURM documents I mentioned (and there are many similar) all say to use:

 

I_MPI_PMI_LIBRARY=/path/to/slurm/lib/libpmi2.so

 

However, with slurm as default in many cases this leads to

MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found

 

There are two other environmental variable which might be relevant

SLURM_MPI_TYPE=pmi2
I_MPI_PMI=pmi2

 

To date I see no difference using these. Can you please clarify what is appropriate with Intel impi, since currently I cannot find anything about how to use PMI2 in the available Intel documentation and the information in the slurm documentation out there appears to be incorrect.

--

N.B., This is for Wien2k, which is the standard benchmark code for density functional theory calculations, e.g. https://doi.org/10.1038/s42254-023-00655-3. This code does not just use a single mpirun (or srun), it is more intelligent (faster) and dispatches multiple mpi tasks to different nodes/cores. Therefore oversimple answers, alas, as less useful.

0 Kudos
Mahan
Moderator
5,104 Views

This is one of the ways to use pmi2

$ salloc -N10 --exclusive
$ export I_MPI_PMI_LIBRARY=/path/to/slurm/lib/libpmi2.so
$ mpirun -np <num_procs> user_app.bin

 

Please follow the link for more information

https://slurm.schedmd.com/mpi_guide.html#intel_mpi

 

Do let me know if you face any issues.

0 Kudos
L__D__Marks
New Contributor II
5,092 Views

Sorry, this is very incorrect, please see the prior information about the argument being ignored. The slurm info is not correct.

Also --exclusive is not an appropriate suggestion, that has too many other consequences.

0 Kudos
Mahan
Moderator
5,076 Views

Hi @L__D__Marks, you are right that the slurm information in the intel documentation is not correct.

The environment variable seems to be correct.

Would it be possible for you to use "srun" instead of "mpirun/mpiexec"  

0 Kudos
L__D__Marks
New Contributor II
5,059 Views

Unfortunately I have not found any way of using srun directly. The code runs a sequence of (what slurm calls) job steps. Some are serial and quick, the others are multiple parallel mpi tasks using different nodes. A schematic example would be

mpirun -np 8 -machinefile host1 &

mpirun -np 8 -machinefile host2 &

...wait for completion then do the next job step.

 

Similar to https://bugs.schedmd.com/show_bug.cgi?id=11863 it seems that  "export SLURM_OVERLAP=1" matters, this appears to be common. (This is passed down through mpiexec.hydra which uses srun to launch.)

 

It is not clear to me whether I_MPI_PMI, SLURM_MPI_TYPE or even SLURM_OVERCOMMIT matter.

 

Unfortunately,currently I cannot switch to ssh launcher due to some form of misconfiguration where ssh is blocked on some of the nodes. Some sys_admins are trying to sort that out. Hence I can only at the moment test with srun launcher in mpirun.

0 Kudos
Mahan
Moderator
4,997 Views

Hi @L__D__Marks 

 

I understand your difficulty in running the application using 'srun'.

Please allow me a few days as I need to discuss this with the development teams to see whether there is any workaround where one could use mpirun with PMI2 instead of 'srun'.

0 Kudos
Mahan
Moderator
4,798 Views

Hi @L__D__Marks 

 

As it appears that the only way to use mpi2 with slurm is using srun.

I have the following output from an Intel MPI benchmarking program for your reference

 

MPI startup(): Copyright (C) 2003-2023 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found
[0] MPI startup(): libfabric loaded: libfabric.so.1
[0] MPI startup(): libfabric version: 1.18.1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: tcp
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.11/opt/mpi/etc/tuning_spr_shm-ofi.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 19 (TAG_UB value: 524287)
[0] MPI startup(): source bits available: 20 (Maximal number of rank: 1048575)
[0] MPI startup(): ===== Nic pinning on sdp4578 =====
[0] MPI startup(): Rank Pin nic
[0] MPI startup(): 0    enp1s0
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       1539366  sdp4578    {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
                                 30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56
                                 ,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,8
                                 3,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,10
                                 7,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,12
                                 7,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,14
                                 7,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,16
                                 7,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,18
                                 7,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,20
                                 7,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223}
[0] MPI startup(): 1       184901   sdp5259    {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
                                 30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56
                                 ,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,8
                                 3,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,10
                                 7,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,12
                                 7,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,14
                                 7,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,16
                                 7,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,18
                                 7,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,20
                                 7,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.11
[0] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS=--external-launcher
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=0
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10
[0] MPI startup(): I_MPI_PMI_LIBRARY=/usr/local/lib/libpmi2.so
#----------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2021.7, MPI-1 part
#----------------------------------------------------------------
# Date                  : Wed Jan 17 22:32:47 2024
# Machine               : x86_64
# System                : Linux
# Release               : 5.15.0-86-generic
# Version               : #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023
# MPI Version           : 3.1
# MPI Thread Environment:

 

 

# Calling sequence was:

 

# IMB-MPI1 allreduce -msglog 2:3

 

# Minimum message length in bytes:   0
# Maximum message length in bytes:   8
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

 

# List of Benchmarks to run:

 

# Allreduce

 

#----------------------------------------------------------------
# Benchmarking Allreduce
# #processes = 2
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.02         0.03         0.03
            4         1000        48.90        49.16        49.03
            8         1000        48.87        48.90        48.88

 

 

# All processes entering MPI_Finalize

0 Kudos
Mahan
Moderator
4,798 Views

Hi @L__D__Marks 

 

Could please also let me know the reason of using pmi2, as you mentioned briefly in your initial post that there is some crash/performance drop while using IntelMPI

0 Kudos
L__D__Marks
New Contributor II
4,777 Views

The reason to try PMI2 is because all documentation (Intel's included) says PMI1 is inferior (obsolete). Your printout just confirms what I and others have reported, some more details please:

1) Are you running under slurm?

2) What launcher are you using

3) Does your test program report the protocol it is using? Would the line

# IMB-MPI1 allreduce -msglog 2:3

change if PMI2 is being used?

4) Did you set relevant environmental parameters:

export SLURM_MPI_TYPE =pmi2
export I_MPI_PMI =pmi2

 

Sorry. But your last messages don't answer the question. What information did the development team provide? Maybe they should respond (escalation).

 

N.B., PMIX may also be relevant.

0 Kudos
Mahan
Moderator
4,768 Views

Hi @L__D__Marks 

 

I am running this on a cluster which has

slurm 23.11

oneAPI 2024, which comes with Intel MPI 2021.11.

 

"IMB-MPI1 allreduce -msglog 2:3" is a benchmark problem available in Intel OneAPI suit to test mpi. You could choose your own mpi program. The environment variables which are in use, as shown in the previous reply 

[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.11
[0] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS=--external-launcher
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=0
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10
[0] MPI startup(): I_MPI_PMI_LIBRARY=/usr/local/lib/libpmi2.so

 

The important point which I wanted to make here that if you want to run using pmi2, then currently the only option is to use srun,

# Run your application using srun with the PMI-2 interface.
I_MPI_PMI_LIBRARY=<path-to-libpmi2.so>/libpmi2.so srun --mpi=pmi2 ./myprog

 For more information please check the following 

https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-11/job-schedulers-support.html

 

 

0 Kudos
L__D__Marks
New Contributor II
4,744 Views

Please read the prior posts, and do not respond with trivial answers.

 

Just using srun ./mprog is a novice response, inappropriate for professional hard-core supercomputing. I pointed out that this is inappropriate weeks ago.

 

Please escalate this to someone who is an expert, will read the prior information (including the fact that the page you suggest I read is wrong) and is knowledgeable. Hopefully then can construct a code which will show what interface is being used.

 

Escalate please.

0 Kudos
TobiasK
Moderator
4,570 Views

@L__D__Marks 
This forum is a community forum, not a support forum.

mpirun will always use our internal PMI library, if you want to use a different PMI library you have to provide the full path and use srun instead of mpirun.

 I_MPI_PMI_LIBRARY=/path/to/slurm/lib/libpmi2.so

 

0 Kudos
Terrence_at_Houston
2,753 Views

I have same "problem" with SLRUM  on intelmpi need pmi2   with mpirun, and good to find out others have same srun problem here. 

MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found
 
My workaround to make my mpi-intel/2021.u11's mpirun work is built my own ucx 1.15.0  and  set LD_LIBRARY_PATH to it and using -env UCX_TLS rc,sm,self.   

Test run with np=192,   the srun with pmi2 is about 26 min wall time and mpirun + ucx1.15.0  is about 23 min,  so they are close enough.

For me, this works for np up-to around 400 for my CFD type of simulation. After that, intelmpi die.  Anything above np=400,  I use MPI+threads hybrid approach to workaround this problem.  Our system is AMD Genoa from Cray.    On the other hand,  OpenMPI seems doing just fine with mpirun.    Our code on Genoa run faster with MPI+threads actually.   So, in the very early stage, we were using mpirun to perform the core binding / threading pinning.   HPC systems on our different sites run different type of job scheduler.   Maybe srun can do it,  but,  same mpirun command is used by both SLURM and LSF  (which is easier for me to do so.)

0 Kudos
TobiasK
Moderator
2,746 Views

@Terrence_at_Houston 

Sorry, but I really do not understand what's your question.

The initial question was if PMI2 is more thread safe than PMI1
To use PMI2 or PMIx srun has to be used together with setting the path. mpirun will just ignore this, hence prints out the warning.

If you are building your own UCX and setting some UCX env variables that has nothing to do with PMI/srun/mpirun.

UCX is always used with Infiniband networks / the mlx provider and UCX environment variables are always used by UCX.

0 Kudos
Reply