Re: Please provide the equivalent mpirun flags

Arunamat · ‎04-02-2025

Hi Team,
We are using this below command using intel oneapi 2023.1 and 2024.0.1.
mpirun-n 4 -machinefile $HOME/machinefile --map-by socket:pe=$CORES_PER_TASK --bind-to core
The above MPIRUN options is not working on these intel oneapi 2025.0.1 and 2024.2.1 version.
can you please suggest equivalent option to make it work on the latest version.

Thanks
Arun prasad

TobiasK · ‎04-02-2025

@Arunamat of course that does not work since it's an OpenMPI specific command line.

What are you trying to achieve? If you are using OpenMP and OMP_NUM_THREADS x 4 is using all the cores, just delte --map-by and --bind-to core

Best
Tobias

Arunamat · ‎04-02-2025

Hi @TobiasK ,
I am using Intel MPI not OpenMP. Actually, this option is working well in INTEL MPI version 2023 and 2024.0.1 but not in latest mentioned version. Please let us know what is exact equivalent flag to get same behavior.

Thanks
Arun prasad

TobiasK · ‎04-02-2025

Ok I have to apologize, even though --map-by --bind-to are OpenMPI specific binding options, we just ignore them. However, that works.

Can you please give the output of:

I_MPI_DEBUG=10 mpirun -n 4 --machinefile $HOME/machinefile --map-by socket:pe=1 IMB-MPI1 allreduce

Arunamat · ‎04-02-2025

Please find the output on version 2024.2.1 version

Arunamat · ‎04-02-2025

Please find output from 2023 version:
Using default -machinefile setting (/enc/x0144578/mpd.hosts)
[0] MPI startup(): Intel(R) MPI Library, Version 2021.9 Build 20230307 (id: d82b3071db)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/program/intel-oneapi-2023.1/2023.1.0/mpi/2021.9.0/etc/tuning_icx_shm.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823)
[0] MPI startup(): source bits available: 0 (Maximal number of rank: 0)
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 16048 ip-10-132-133-136.ec2.internal {0,1,2,3,16,17,18,19}
[0] MPI startup(): 1 16049 ip-10-132-133-136.ec2.internal {4,5,6,7,20,21,22,23}
[0] MPI startup(): 2 16050 ip-10-132-133-136.ec2.internal {8,9,10,11,24,25,26,27}
[0] MPI startup(): 3 16051 ip-10-132-133-136.ec2.internal {12,13,14,15,28,29,30,31}
[0] MPI startup(): I_MPI_ROOT=/program/intel-oneapi-2023.1/2023.1.0/mpi/2021.9.0
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_FABRICS=shm
[0] MPI startup(): I_MPI_DEBUG=10
#----------------------------------------------------------------
# Intel(R) MPI Benchmarks 2021.5, MPI-1 part
#----------------------------------------------------------------
# Date : Wed Apr 2 19:18:36 2025
# Machine : x86_64
# System : Linux
# Release : 4.18.0-425.19.2.el8_7.x86_64
# Version : #1 SMP Tue Apr 4 22:38:11 UTC 2023
# MPI Version : 3.1
# MPI Thread Environment:

# Calling sequence was:

# IMB-MPI1 allreduce

# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#

# List of Benchmarks to run:

# Allreduce

#----------------------------------------------------------------
# Benchmarking Allreduce
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.03 0.03 0.03
4 1000 0.42 0.50 0.46
8 1000 0.46 0.52 0.49
16 1000 0.44 0.50 0.47
32 1000 0.42 0.47 0.45
64 1000 0.50 0.54 0.52
128 1000 0.51 0.55 0.53
256 1000 0.50 0.55 0.53
512 1000 0.66 0.85 0.76
1024 1000 0.73 0.80 0.76
2048 1000 0.87 0.90 0.89
4096 1000 1.11 1.19 1.15
8192 1000 1.95 2.05 2.00
16384 1000 3.56 3.73 3.65
32768 1000 5.77 5.78 5.78
65536 640 7.48 7.50 7.49
131072 320 12.94 12.96 12.95
262144 160 24.62 24.72 24.67
524288 80 55.64 56.88 56.26
1048576 40 129.90 130.03 129.97
2097152 20 321.61 321.76 321.68
4194304 10 694.10 694.28 694.19

#----------------------------------------------------------------
# Benchmarking Allreduce
# #processes = 4
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.03 0.04 0.04
4 1000 0.40 0.50 0.46
8 1000 0.38 0.47 0.45
16 1000 0.85 0.85 0.85
32 1000 0.42 0.51 0.47
64 1000 0.87 0.88 0.88
128 1000 0.91 0.91 0.91
256 1000 0.96 0.96 0.96
512 1000 0.93 1.04 0.98
1024 1000 1.33 1.34 1.33
2048 1000 1.59 1.61 1.60
4096 1000 2.14 2.15 2.15
8192 1000 3.92 4.20 4.06
16384 1000 5.21 5.23 5.22
32768 1000 8.47 8.50 8.48
65536 640 12.35 12.39 12.37
131072 320 19.99 20.32 20.16
262144 160 37.97 38.55 38.26
524288 80 76.63 78.00 77.15
1048576 40 188.38 193.09 190.38
2097152 20 346.11 354.84 351.65
4194304 10 1031.87 1033.13 1032.28

# All processes entering MPI_Finalize

Arunamat · ‎04-02-2025

Please find the output on version 2024.2.1 version

TobiasK · ‎04-03-2025

@Arunamat

We reenabled parsing of --bind-to and --map-by with 2021.15 which is part of OneAPI 2025.1.

However, please note, we really just ignore those options, so please clean up your parameters.

Arunamat · ‎04-08-2025

Hi @TobiasK ,
When will be this version release available?

Please share if any equivalent INTEL MPI option available for the same above OPENMPI parameters. Please share your inputs.

Thanks
Arun

TobiasK · ‎04-09-2025

Hi @Arunamat
oneAPI 2025.1/Intel MPI 2021.15 are already out.

For the pinning options, please find the documentation here:
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-15/process-pinning.html

You can find the actual pinning in your debug output:

[0] MPI startup(): 0 16048 ip-10-132-133-136.ec2.internal {0,1,2,3,16,17,18,19}
[0] MPI startup(): 1 16049 ip-10-132-133-136.ec2.internal {4,5,6,7,20,21,22,23}
[0] MPI startup(): 2 16050 ip-10-132-133-136.ec2.internal {8,9,10,11,24,25,26,27}
[0] MPI startup(): 3 16051 ip-10-132-133-136.ec2.internal {12,13,14,15,28,29,30,31}