<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Intel MPI Allreduce Scalability Problem in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1626670#M11862</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;I'm having an issue with MPI_ALLreduce scalability using Intel MPI. On our cluster which each compute node has 64 cores, tested using IMB Benchmark. When using 48 cores and 60 cores per node (without using all cores of the node), they run with very different results. 48 cores per node case have significantly better results than 60 cores, what is the possible problem?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;FYI, I submitted the job using sbatch, and Intel MPI uses the default settings, using a command similar to&lt;/P&gt;&lt;P&gt;"mpirun -np ${mpirun_np} -f hosts -perhost ${mpirun_perhost} IMB-MPI1 Allreduce -npmin 5400 -off_ cache 60,64"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I use Intel MPI 2021.6.0 version.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4320 MPI processes (48cores/node, 90nodes) result:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MetMan_0-1724815483936.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/58375iDE13A018A63FC515/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="MetMan_0-1724815483936.png" alt="MetMan_0-1724815483936.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;5400 MPI processes (60cores/node, 90nodes) result:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MetMan_1-1724815550552.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/58376i32C4E53010BB39C5/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="MetMan_1-1724815550552.png" alt="MetMan_1-1724815550552.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 28 Aug 2024 03:32:14 GMT</pubDate>
    <dc:creator>MetMan</dc:creator>
    <dc:date>2024-08-28T03:32:14Z</dc:date>
    <item>
      <title>Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1626670#M11862</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;I'm having an issue with MPI_ALLreduce scalability using Intel MPI. On our cluster which each compute node has 64 cores, tested using IMB Benchmark. When using 48 cores and 60 cores per node (without using all cores of the node), they run with very different results. 48 cores per node case have significantly better results than 60 cores, what is the possible problem?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;FYI, I submitted the job using sbatch, and Intel MPI uses the default settings, using a command similar to&lt;/P&gt;&lt;P&gt;"mpirun -np ${mpirun_np} -f hosts -perhost ${mpirun_perhost} IMB-MPI1 Allreduce -npmin 5400 -off_ cache 60,64"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I use Intel MPI 2021.6.0 version.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4320 MPI processes (48cores/node, 90nodes) result:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MetMan_0-1724815483936.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/58375iDE13A018A63FC515/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="MetMan_0-1724815483936.png" alt="MetMan_0-1724815483936.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;5400 MPI processes (60cores/node, 90nodes) result:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MetMan_1-1724815550552.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/58376i32C4E53010BB39C5/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="MetMan_1-1724815550552.png" alt="MetMan_1-1724815550552.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2024 03:32:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1626670#M11862</guid>
      <dc:creator>MetMan</dc:creator>
      <dc:date>2024-08-28T03:32:14Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627037#M11866</link>
      <description>&lt;P&gt;&lt;a href="https://community.intel.com/t5/user/viewprofilepage/user-id/366336"&gt;@MetMan&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;2021.6 is too old, please retry with the latest 2021.13.1 / oneAPI 2024.2.1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MPI performance may depend on a various set of variables, please post the full HW and SW environment, otherwise it's just guessing around.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Aug 2024 08:03:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627037#M11866</guid>
      <dc:creator>TobiasK</dc:creator>
      <dc:date>2024-08-29T08:03:43Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627099#M11869</link>
      <description>&lt;P&gt;I didn't explicitly set any MPI environment variables, but used the default setting.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hw configuration:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;$ lscpu&lt;/P&gt;&lt;P&gt;Architecture: x86_64&lt;BR /&gt;CPU op-mode(s): 32-bit, 64-bit&lt;BR /&gt;Byte Order: Little Endian&lt;BR /&gt;CPU(s): 64&lt;BR /&gt;On-line CPU(s) list: 0-63&lt;BR /&gt;Thread(s) per core: 1&lt;BR /&gt;Core(s) per socket: 32&lt;BR /&gt;Socket(s): 2&lt;BR /&gt;NUMA node(s): 4&lt;BR /&gt;Vendor ID: GenuineIntel&lt;BR /&gt;CPU family: 6&lt;BR /&gt;Model: 143&lt;BR /&gt;Model name: Intel(R) Xeon(R) Gold 6458Q&lt;BR /&gt;Stepping: 8&lt;BR /&gt;CPU MHz: 3999.638&lt;BR /&gt;CPU max MHz: 3101.0000&lt;BR /&gt;CPU min MHz: 800.0000&lt;BR /&gt;BogoMIPS: 6200.00&lt;BR /&gt;Virtualization: VT-x&lt;BR /&gt;L1d cache: 48K&lt;BR /&gt;L1i cache: 32K&lt;BR /&gt;L2 cache: 2048K&lt;BR /&gt;L3 cache: 61440K&lt;BR /&gt;NUMA node0 CPU(s): 0-15&lt;BR /&gt;NUMA node1 CPU(s): 16-31&lt;BR /&gt;NUMA node2 CPU(s): 32-47&lt;BR /&gt;NUMA node3 CPU(s): 48-63&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Do you mean using the impi_info tool to get SW environment variables?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Aug 2024 12:34:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627099#M11869</guid>
      <dc:creator>MetMan</dc:creator>
      <dc:date>2024-08-29T12:34:34Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627376#M11874</link>
      <description>&lt;P&gt;With SW environment we are referring to OS, SW stack for the nic&lt;BR /&gt;What do you use for interconnect?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please add the output of&amp;nbsp;&lt;BR /&gt;I_MPI_DEBUG=10&amp;nbsp;&lt;SPAN&gt;mpirun -np ${mpirun_np} -f hosts -perhost ${mpirun_perhost} IMB-MPI1 Allreduce -npmin 5400 -off_ cache 60,64&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 12:01:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627376#M11874</guid>
      <dc:creator>TobiasK</dc:creator>
      <dc:date>2024-08-30T12:01:28Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627384#M11876</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;SPAN&gt;TobiasK.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;OS: centos&amp;nbsp;8.4.2105&lt;/P&gt;&lt;P&gt;Interconnect: Infiniband&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The output of&amp;nbsp;&lt;SPAN&gt;I_MPI_DEBUG=10 is too much. I paste useful info.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;BR /&gt;[1500] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;...&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;BR /&gt;[240] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;BR /&gt;[4140] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;BR /&gt;[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)&lt;BR /&gt;[0] MPI startup(): libfabric provider: mlx&lt;BR /&gt;[0] MPI startup(): File "/opt/hpc/software/mpi/intelmpi/2021.6.0/etc/tuning_icx_shm-ofi_mlx_400.dat" not found&lt;BR /&gt;[0] MPI startup(): Load tuning file: "/opt/hpc/software/mpi/intelmpi/2021.6.0/etc/tuning_icx_shm-ofi_mlx.dat"&lt;BR /&gt;[0] MPI startup(): threading: mode: direct&lt;BR /&gt;[0] MPI startup(): threading: vcis: 1&lt;BR /&gt;[0] MPI startup(): threading: app_threads: -1&lt;BR /&gt;[0] MPI startup(): threading: runtime: generic&lt;BR /&gt;[0] MPI startup(): threading: progress_threads: 0&lt;BR /&gt;[0] MPI startup(): threading: async_progress: 0&lt;BR /&gt;[0] MPI startup(): threading: lock_level: global&lt;BR /&gt;[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)&lt;BR /&gt;[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)&lt;BR /&gt;[0] MPI startup(): Rank Pid Node name Pin cpu&lt;BR /&gt;[0] MPI startup(): 0 1788071 cmbc1653 32&lt;BR /&gt;[0] MPI startup(): 1 1788072 cmbc1653 33&lt;BR /&gt;[0] MPI startup(): 2 1788073 cmbc1653 34&lt;BR /&gt;[0] MPI startup(): 3 1788074 cmbc1653 35&lt;BR /&gt;[0] MPI startup(): 4 1788075 cmbc1653 36&lt;BR /&gt;[0] MPI startup(): 5 1788076 cmbc1653 37&lt;BR /&gt;[0] MPI startup(): 6 1788077 cmbc1653 38&lt;BR /&gt;[0] MPI startup(): 7 1788078 cmbc1653 39&lt;BR /&gt;[0] MPI startup(): 8 1788079 cmbc1653 40&lt;BR /&gt;[0] MPI startup(): 9 1788080 cmbc1653 41&lt;BR /&gt;[0] MPI startup(): 10 1788081 cmbc1653 42&lt;BR /&gt;[0] MPI startup(): 11 1788082 cmbc1653 43&lt;BR /&gt;[0] MPI startup(): 12 1788083 cmbc1653 44&lt;BR /&gt;[0] MPI startup(): 13 1788084 cmbc1653 45&lt;BR /&gt;[0] MPI startup(): 14 1788085 cmbc1653 46&lt;BR /&gt;[0] MPI startup(): 15 1788086 cmbc1653 0&lt;BR /&gt;[0] MPI startup(): 16 1788087 cmbc1653 1&lt;BR /&gt;[0] MPI startup(): 17 1788088 cmbc1653 2&lt;BR /&gt;[0] MPI startup(): 18 1788089 cmbc1653 3&lt;BR /&gt;[0] MPI startup(): 19 1788090 cmbc1653 4&lt;BR /&gt;[0] MPI startup(): 20 1788091 cmbc1653 5&lt;BR /&gt;[0] MPI startup(): 21 1788092 cmbc1653 6&lt;BR /&gt;[0] MPI startup(): 22 1788093 cmbc1653 7&lt;BR /&gt;[0] MPI startup(): 23 1788094 cmbc1653 8&lt;BR /&gt;[0] MPI startup(): 24 1788095 cmbc1653 9&lt;BR /&gt;[0] MPI startup(): 25 1788096 cmbc1653 10&lt;BR /&gt;[0] MPI startup(): 26 1788097 cmbc1653 11&lt;BR /&gt;[0] MPI startup(): 27 1788098 cmbc1653 12&lt;BR /&gt;[0] MPI startup(): 28 1788099 cmbc1653 13&lt;BR /&gt;[0] MPI startup(): 29 1788100 cmbc1653 14&lt;BR /&gt;[0] MPI startup(): 30 1788101 cmbc1653 16&lt;BR /&gt;[0] MPI startup(): 31 1788102 cmbc1653 17&lt;BR /&gt;[0] MPI startup(): 32 1788103 cmbc1653 18&lt;BR /&gt;[0] MPI startup(): 33 1788104 cmbc1653 19&lt;BR /&gt;[0] MPI startup(): 34 1788105 cmbc1653 20&lt;BR /&gt;[0] MPI startup(): 35 1788106 cmbc1653 21&lt;BR /&gt;[0] MPI startup(): 36 1788107 cmbc1653 22&lt;BR /&gt;[0] MPI startup(): 37 1788108 cmbc1653 23&lt;BR /&gt;[0] MPI startup(): 38 1788109 cmbc1653 24&lt;BR /&gt;[0] MPI startup(): 39 1788110 cmbc1653 25&lt;BR /&gt;[0] MPI startup(): 40 1788111 cmbc1653 26&lt;BR /&gt;[0] MPI startup(): 41 1788112 cmbc1653 27&lt;BR /&gt;[0] MPI startup(): 42 1788113 cmbc1653 28&lt;BR /&gt;[0] MPI startup(): 43 1788114 cmbc1653 29&lt;BR /&gt;[0] MPI startup(): 44 1788115 cmbc1653 30&lt;BR /&gt;[0] MPI startup(): 45 1788116 cmbc1653 48&lt;BR /&gt;[0] MPI startup(): 46 1788117 cmbc1653 49&lt;BR /&gt;[0] MPI startup(): 47 1788118 cmbc1653 50&lt;BR /&gt;[0] MPI startup(): 48 1788119 cmbc1653 51&lt;BR /&gt;[0] MPI startup(): 49 1788120 cmbc1653 52&lt;BR /&gt;[0] MPI startup(): 50 1788121 cmbc1653 53&lt;BR /&gt;[0] MPI startup(): 51 1788122 cmbc1653 54&lt;BR /&gt;[0] MPI startup(): 52 1788123 cmbc1653 55&lt;BR /&gt;[0] MPI startup(): 53 1788124 cmbc1653 56&lt;BR /&gt;[0] MPI startup(): 54 1788125 cmbc1653 57&lt;BR /&gt;[0] MPI startup(): 55 1788126 cmbc1653 58&lt;BR /&gt;[0] MPI startup(): 56 1788127 cmbc1653 59&lt;BR /&gt;[0] MPI startup(): 57 1788128 cmbc1653 60&lt;BR /&gt;[0] MPI startup(): 58 1788129 cmbc1653 61&lt;BR /&gt;[0] MPI startup(): 59 1788130 cmbc1653 62&lt;BR /&gt;[0] MPI startup(): 60 661144 cmbc1654 32&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;...&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;[0] MPI startup(): I_MPI_ROOT=/opt/hpc/software/mpi/intelmpi/2021.6.0&lt;BR /&gt;[0] MPI startup(): I_MPI_MPIRUN=mpirun&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=slurm&lt;BR /&gt;[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default&lt;BR /&gt;[0] MPI startup(): I_MPI_DEBUG=10&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;# Intel(R) MPI Benchmarks 2021.4, MPI-1 part&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;# Date : Fri Aug 30 12:42:54 2024&lt;BR /&gt;# Machine : x86_64&lt;BR /&gt;# System : Linux&lt;BR /&gt;# Release : 4.18.0-305.3.1.el8.x86_64&lt;BR /&gt;# Version : #1 SMP Tue Jun 1 16:14:33 UTC 2021&lt;BR /&gt;# MPI Version : 3.1&lt;BR /&gt;# MPI Thread Environment:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# Calling sequence was:&lt;/P&gt;&lt;P&gt;# imb/IMB-MPI1 Allreduce -npmin 5400 -off_cache 60,64&lt;/P&gt;&lt;P&gt;# Minimum message length in bytes: 0&lt;BR /&gt;# Maximum message length in bytes: 4194304&lt;BR /&gt;#&lt;BR /&gt;# MPI_Datatype : MPI_BYTE&lt;BR /&gt;# MPI_Datatype for reductions : MPI_FLOAT&lt;BR /&gt;# MPI_Op : MPI_SUM&lt;BR /&gt;#&lt;BR /&gt;#&lt;/P&gt;&lt;P&gt;# List of Benchmarks to run:&lt;/P&gt;&lt;P&gt;# Allreduce&lt;/P&gt;&lt;P&gt;#----------------------------------------------------------------&lt;BR /&gt;# Benchmarking Allreduce&lt;BR /&gt;# #processes = 5400&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]&lt;BR /&gt;0 1000 0.02 0.09 0.03&lt;BR /&gt;4 1000 750.44 773.20 761.29&lt;BR /&gt;8 1000 797.67 843.62 826.09&lt;BR /&gt;16 1000 828.77 858.66 846.15&lt;BR /&gt;32 1000 814.64 858.83 841.92&lt;BR /&gt;64 1000 823.91 914.41 838.27&lt;BR /&gt;128 1000 850.97 944.32 871.04&lt;BR /&gt;256 1000 843.54 933.99 858.84&lt;BR /&gt;512 1000 844.91 945.35 867.06&lt;BR /&gt;1024 1000 845.97 938.68 860.53&lt;BR /&gt;2048 1000 855.51 956.05 869.80&lt;BR /&gt;4096 1000 862.21 954.42 874.54&lt;BR /&gt;8192 1000 855.78 951.25 874.08&lt;BR /&gt;16384 1000 994.88 1098.12 1022.46&lt;BR /&gt;32768 1000 1036.75 1149.22 1055.48&lt;BR /&gt;65536 640 1072.59 1180.20 1102.77&lt;BR /&gt;131072 320 1855.89 1943.43 1904.21&lt;BR /&gt;262144 160 1910.85 2016.92 1956.57&lt;BR /&gt;524288 80 2612.14 2770.10 2683.22&lt;BR /&gt;1048576 40 2359.23 2660.41 2484.20&lt;BR /&gt;2097152 20 3038.83 3444.00 3173.75&lt;BR /&gt;4194304 10 4575.82 5294.64 4807.26&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# All processes entering MPI_Finalize&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 12:55:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627384#M11876</guid>
      <dc:creator>MetMan</dc:creator>
      <dc:date>2024-08-30T12:55:51Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627389#M11877</link>
      <description>&lt;P&gt;Hi, TobiasK.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;OS: centos 8.4.2105&lt;/P&gt;&lt;P&gt;Interconnect: Infiniband&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I only paste useful info from&amp;nbsp;&lt;SPAN&gt;I_MPI_DEBUG=10 output:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;BR /&gt;[1500] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;...&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;BR /&gt;[240] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;BR /&gt;[4140] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;BR /&gt;[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)&lt;BR /&gt;[0] MPI startup(): libfabric provider: mlx&lt;BR /&gt;[0] MPI startup(): File "/opt/hpc/software/mpi/intelmpi/2021.6.0/etc/tuning_icx_shm-ofi_mlx_400.dat" not found&lt;BR /&gt;[0] MPI startup(): Load tuning file: "/opt/hpc/software/mpi/intelmpi/2021.6.0/etc/tuning_icx_shm-ofi_mlx.dat"&lt;BR /&gt;[0] MPI startup(): threading: mode: direct&lt;BR /&gt;[0] MPI startup(): threading: vcis: 1&lt;BR /&gt;[0] MPI startup(): threading: app_threads: -1&lt;BR /&gt;[0] MPI startup(): threading: runtime: generic&lt;BR /&gt;[0] MPI startup(): threading: progress_threads: 0&lt;BR /&gt;[0] MPI startup(): threading: async_progress: 0&lt;BR /&gt;[0] MPI startup(): threading: lock_level: global&lt;BR /&gt;[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)&lt;BR /&gt;[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)&lt;BR /&gt;[0] MPI startup(): Rank Pid Node name Pin cpu&lt;BR /&gt;[0] MPI startup(): 0 1788071 cmbc1653 32&lt;BR /&gt;[0] MPI startup(): 1 1788072 cmbc1653 33&lt;BR /&gt;[0] MPI startup(): 2 1788073 cmbc1653 34&lt;BR /&gt;[0] MPI startup(): 3 1788074 cmbc1653 35&lt;BR /&gt;[0] MPI startup(): 4 1788075 cmbc1653 36&lt;BR /&gt;[0] MPI startup(): 5 1788076 cmbc1653 37&lt;BR /&gt;[0] MPI startup(): 6 1788077 cmbc1653 38&lt;BR /&gt;[0] MPI startup(): 7 1788078 cmbc1653 39&lt;BR /&gt;[0] MPI startup(): 8 1788079 cmbc1653 40&lt;BR /&gt;[0] MPI startup(): 9 1788080 cmbc1653 41&lt;BR /&gt;[0] MPI startup(): 10 1788081 cmbc1653 42&lt;BR /&gt;[0] MPI startup(): 11 1788082 cmbc1653 43&lt;BR /&gt;[0] MPI startup(): 12 1788083 cmbc1653 44&lt;BR /&gt;[0] MPI startup(): 13 1788084 cmbc1653 45&lt;BR /&gt;[0] MPI startup(): 14 1788085 cmbc1653 46&lt;BR /&gt;[0] MPI startup(): 15 1788086 cmbc1653 0&lt;BR /&gt;[0] MPI startup(): 16 1788087 cmbc1653 1&lt;BR /&gt;[0] MPI startup(): 17 1788088 cmbc1653 2&lt;BR /&gt;[0] MPI startup(): 18 1788089 cmbc1653 3&lt;BR /&gt;[0] MPI startup(): 19 1788090 cmbc1653 4&lt;BR /&gt;[0] MPI startup(): 20 1788091 cmbc1653 5&lt;BR /&gt;[0] MPI startup(): 21 1788092 cmbc1653 6&lt;BR /&gt;[0] MPI startup(): 22 1788093 cmbc1653 7&lt;BR /&gt;[0] MPI startup(): 23 1788094 cmbc1653 8&lt;BR /&gt;[0] MPI startup(): 24 1788095 cmbc1653 9&lt;BR /&gt;[0] MPI startup(): 25 1788096 cmbc1653 10&lt;BR /&gt;[0] MPI startup(): 26 1788097 cmbc1653 11&lt;BR /&gt;[0] MPI startup(): 27 1788098 cmbc1653 12&lt;BR /&gt;[0] MPI startup(): 28 1788099 cmbc1653 13&lt;BR /&gt;[0] MPI startup(): 29 1788100 cmbc1653 14&lt;BR /&gt;[0] MPI startup(): 30 1788101 cmbc1653 16&lt;BR /&gt;[0] MPI startup(): 31 1788102 cmbc1653 17&lt;BR /&gt;[0] MPI startup(): 32 1788103 cmbc1653 18&lt;BR /&gt;[0] MPI startup(): 33 1788104 cmbc1653 19&lt;BR /&gt;[0] MPI startup(): 34 1788105 cmbc1653 20&lt;BR /&gt;[0] MPI startup(): 35 1788106 cmbc1653 21&lt;BR /&gt;[0] MPI startup(): 36 1788107 cmbc1653 22&lt;BR /&gt;[0] MPI startup(): 37 1788108 cmbc1653 23&lt;BR /&gt;[0] MPI startup(): 38 1788109 cmbc1653 24&lt;BR /&gt;[0] MPI startup(): 39 1788110 cmbc1653 25&lt;BR /&gt;[0] MPI startup(): 40 1788111 cmbc1653 26&lt;BR /&gt;[0] MPI startup(): 41 1788112 cmbc1653 27&lt;BR /&gt;[0] MPI startup(): 42 1788113 cmbc1653 28&lt;BR /&gt;[0] MPI startup(): 43 1788114 cmbc1653 29&lt;BR /&gt;[0] MPI startup(): 44 1788115 cmbc1653 30&lt;BR /&gt;[0] MPI startup(): 45 1788116 cmbc1653 48&lt;BR /&gt;[0] MPI startup(): 46 1788117 cmbc1653 49&lt;BR /&gt;[0] MPI startup(): 47 1788118 cmbc1653 50&lt;BR /&gt;[0] MPI startup(): 48 1788119 cmbc1653 51&lt;BR /&gt;[0] MPI startup(): 49 1788120 cmbc1653 52&lt;BR /&gt;[0] MPI startup(): 50 1788121 cmbc1653 53&lt;BR /&gt;[0] MPI startup(): 51 1788122 cmbc1653 54&lt;BR /&gt;[0] MPI startup(): 52 1788123 cmbc1653 55&lt;BR /&gt;[0] MPI startup(): 53 1788124 cmbc1653 56&lt;BR /&gt;[0] MPI startup(): 54 1788125 cmbc1653 57&lt;BR /&gt;[0] MPI startup(): 55 1788126 cmbc1653 58&lt;BR /&gt;[0] MPI startup(): 56 1788127 cmbc1653 59&lt;BR /&gt;[0] MPI startup(): 57 1788128 cmbc1653 60&lt;BR /&gt;[0] MPI startup(): 58 1788129 cmbc1653 61&lt;BR /&gt;[0] MPI startup(): 59 1788130 cmbc1653 62&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;...&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;[0] MPI startup(): I_MPI_ROOT=/opt/hpc/software/mpi/intelmpi/2021.6.0&lt;BR /&gt;[0] MPI startup(): I_MPI_MPIRUN=mpirun&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=slurm&lt;BR /&gt;[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default&lt;BR /&gt;[0] MPI startup(): I_MPI_DEBUG=10&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;# Intel(R) MPI Benchmarks 2021.4, MPI-1 part&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;# Date : Fri Aug 30 12:42:54 2024&lt;BR /&gt;# Machine : x86_64&lt;BR /&gt;# System : Linux&lt;BR /&gt;# Release : 4.18.0-305.3.1.el8.x86_64&lt;BR /&gt;# Version : #1 SMP Tue Jun 1 16:14:33 UTC 2021&lt;BR /&gt;# MPI Version : 3.1&lt;BR /&gt;# MPI Thread Environment:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# Calling sequence was:&lt;/P&gt;&lt;P&gt;# imb/IMB-MPI1 Allreduce -npmin 5400 -off_cache 60,64&lt;/P&gt;&lt;P&gt;# Minimum message length in bytes: 0&lt;BR /&gt;# Maximum message length in bytes: 4194304&lt;BR /&gt;#&lt;BR /&gt;# MPI_Datatype : MPI_BYTE&lt;BR /&gt;# MPI_Datatype for reductions : MPI_FLOAT&lt;BR /&gt;# MPI_Op : MPI_SUM&lt;BR /&gt;#&lt;BR /&gt;#&lt;/P&gt;&lt;P&gt;# List of Benchmarks to run:&lt;/P&gt;&lt;P&gt;# Allreduce&lt;/P&gt;&lt;P&gt;#----------------------------------------------------------------&lt;BR /&gt;# Benchmarking Allreduce&lt;BR /&gt;# #processes = 5400&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]&lt;BR /&gt;0 1000 0.02 0.09 0.03&lt;BR /&gt;4 1000 750.44 773.20 761.29&lt;BR /&gt;8 1000 797.67 843.62 826.09&lt;BR /&gt;16 1000 828.77 858.66 846.15&lt;BR /&gt;32 1000 814.64 858.83 841.92&lt;BR /&gt;64 1000 823.91 914.41 838.27&lt;BR /&gt;128 1000 850.97 944.32 871.04&lt;BR /&gt;256 1000 843.54 933.99 858.84&lt;BR /&gt;512 1000 844.91 945.35 867.06&lt;BR /&gt;1024 1000 845.97 938.68 860.53&lt;BR /&gt;2048 1000 855.51 956.05 869.80&lt;BR /&gt;4096 1000 862.21 954.42 874.54&lt;BR /&gt;8192 1000 855.78 951.25 874.08&lt;BR /&gt;16384 1000 994.88 1098.12 1022.46&lt;BR /&gt;32768 1000 1036.75 1149.22 1055.48&lt;BR /&gt;65536 640 1072.59 1180.20 1102.77&lt;BR /&gt;131072 320 1855.89 1943.43 1904.21&lt;BR /&gt;262144 160 1910.85 2016.92 1956.57&lt;BR /&gt;524288 80 2612.14 2770.10 2683.22&lt;BR /&gt;1048576 40 2359.23 2660.41 2484.20&lt;BR /&gt;2097152 20 3038.83 3444.00 3173.75&lt;BR /&gt;4194304 10 4575.82 5294.64 4807.26&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# All processes entering MPI_Finalize&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 13:07:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627389#M11877</guid>
      <dc:creator>MetMan</dc:creator>
      <dc:date>2024-08-30T13:07:20Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627391#M11878</link>
      <description>&lt;P&gt;Hi, TobiasK.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;OS: centos 8.4.2105&lt;/P&gt;&lt;P&gt;Interconnect: Infiniband&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I only paste useful info from&amp;nbsp;&lt;SPAN&gt;I_MPI_DEBUG=10 output:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;BR /&gt;[1500] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;...&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;BR /&gt;[240] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;BR /&gt;[4140] MPI startup(): shm segment size (137 MB per rank) * (60 local ranks) = 8277 MB total&lt;BR /&gt;[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)&lt;BR /&gt;[0] MPI startup(): libfabric provider: mlx&lt;BR /&gt;[0] MPI startup(): File "/opt/hpc/software/mpi/intelmpi/2021.6.0/etc/tuning_icx_shm-ofi_mlx_400.dat" not found&lt;BR /&gt;[0] MPI startup(): Load tuning file: "/opt/hpc/software/mpi/intelmpi/2021.6.0/etc/tuning_icx_shm-ofi_mlx.dat"&lt;BR /&gt;[0] MPI startup(): threading: mode: direct&lt;BR /&gt;[0] MPI startup(): threading: vcis: 1&lt;BR /&gt;[0] MPI startup(): threading: app_threads: -1&lt;BR /&gt;[0] MPI startup(): threading: runtime: generic&lt;BR /&gt;[0] MPI startup(): threading: progress_threads: 0&lt;BR /&gt;[0] MPI startup(): threading: async_progress: 0&lt;BR /&gt;[0] MPI startup(): threading: lock_level: global&lt;BR /&gt;[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)&lt;BR /&gt;[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)&lt;BR /&gt;[0] MPI startup(): Rank Pid Node name Pin cpu&lt;BR /&gt;[0] MPI startup(): 0 1788071 cmbc1653 32&lt;BR /&gt;[0] MPI startup(): 1 1788072 cmbc1653 33&lt;BR /&gt;[0] MPI startup(): 2 1788073 cmbc1653 34&lt;BR /&gt;[0] MPI startup(): 3 1788074 cmbc1653 35&lt;BR /&gt;[0] MPI startup(): 4 1788075 cmbc1653 36&lt;BR /&gt;[0] MPI startup(): 5 1788076 cmbc1653 37&lt;BR /&gt;[0] MPI startup(): 6 1788077 cmbc1653 38&lt;BR /&gt;[0] MPI startup(): 7 1788078 cmbc1653 39&lt;BR /&gt;[0] MPI startup(): 8 1788079 cmbc1653 40&lt;BR /&gt;[0] MPI startup(): 9 1788080 cmbc1653 41&lt;BR /&gt;[0] MPI startup(): 10 1788081 cmbc1653 42&lt;BR /&gt;[0] MPI startup(): 11 1788082 cmbc1653 43&lt;BR /&gt;[0] MPI startup(): 12 1788083 cmbc1653 44&lt;BR /&gt;[0] MPI startup(): 13 1788084 cmbc1653 45&lt;BR /&gt;[0] MPI startup(): 14 1788085 cmbc1653 46&lt;BR /&gt;[0] MPI startup(): 15 1788086 cmbc1653 0&lt;BR /&gt;[0] MPI startup(): 16 1788087 cmbc1653 1&lt;BR /&gt;[0] MPI startup(): 17 1788088 cmbc1653 2&lt;BR /&gt;[0] MPI startup(): 18 1788089 cmbc1653 3&lt;BR /&gt;[0] MPI startup(): 19 1788090 cmbc1653 4&lt;BR /&gt;[0] MPI startup(): 20 1788091 cmbc1653 5&lt;BR /&gt;[0] MPI startup(): 21 1788092 cmbc1653 6&lt;BR /&gt;[0] MPI startup(): 22 1788093 cmbc1653 7&lt;BR /&gt;[0] MPI startup(): 23 1788094 cmbc1653 8&lt;BR /&gt;[0] MPI startup(): 24 1788095 cmbc1653 9&lt;BR /&gt;[0] MPI startup(): 25 1788096 cmbc1653 10&lt;BR /&gt;[0] MPI startup(): 26 1788097 cmbc1653 11&lt;BR /&gt;[0] MPI startup(): 27 1788098 cmbc1653 12&lt;BR /&gt;[0] MPI startup(): 28 1788099 cmbc1653 13&lt;BR /&gt;[0] MPI startup(): 29 1788100 cmbc1653 14&lt;BR /&gt;[0] MPI startup(): 30 1788101 cmbc1653 16&lt;BR /&gt;[0] MPI startup(): 31 1788102 cmbc1653 17&lt;BR /&gt;[0] MPI startup(): 32 1788103 cmbc1653 18&lt;BR /&gt;[0] MPI startup(): 33 1788104 cmbc1653 19&lt;BR /&gt;[0] MPI startup(): 34 1788105 cmbc1653 20&lt;BR /&gt;[0] MPI startup(): 35 1788106 cmbc1653 21&lt;BR /&gt;[0] MPI startup(): 36 1788107 cmbc1653 22&lt;BR /&gt;[0] MPI startup(): 37 1788108 cmbc1653 23&lt;BR /&gt;[0] MPI startup(): 38 1788109 cmbc1653 24&lt;BR /&gt;[0] MPI startup(): 39 1788110 cmbc1653 25&lt;BR /&gt;[0] MPI startup(): 40 1788111 cmbc1653 26&lt;BR /&gt;[0] MPI startup(): 41 1788112 cmbc1653 27&lt;BR /&gt;[0] MPI startup(): 42 1788113 cmbc1653 28&lt;BR /&gt;[0] MPI startup(): 43 1788114 cmbc1653 29&lt;BR /&gt;[0] MPI startup(): 44 1788115 cmbc1653 30&lt;BR /&gt;[0] MPI startup(): 45 1788116 cmbc1653 48&lt;BR /&gt;[0] MPI startup(): 46 1788117 cmbc1653 49&lt;BR /&gt;[0] MPI startup(): 47 1788118 cmbc1653 50&lt;BR /&gt;[0] MPI startup(): 48 1788119 cmbc1653 51&lt;BR /&gt;[0] MPI startup(): 49 1788120 cmbc1653 52&lt;BR /&gt;[0] MPI startup(): 50 1788121 cmbc1653 53&lt;BR /&gt;[0] MPI startup(): 51 1788122 cmbc1653 54&lt;BR /&gt;[0] MPI startup(): 52 1788123 cmbc1653 55&lt;BR /&gt;[0] MPI startup(): 53 1788124 cmbc1653 56&lt;BR /&gt;[0] MPI startup(): 54 1788125 cmbc1653 57&lt;BR /&gt;[0] MPI startup(): 55 1788126 cmbc1653 58&lt;BR /&gt;[0] MPI startup(): 56 1788127 cmbc1653 59&lt;BR /&gt;[0] MPI startup(): 57 1788128 cmbc1653 60&lt;BR /&gt;[0] MPI startup(): 58 1788129 cmbc1653 61&lt;BR /&gt;[0] MPI startup(): 59 1788130 cmbc1653 62&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;...&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 13:15:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627391#M11878</guid>
      <dc:creator>MetMan</dc:creator>
      <dc:date>2024-08-30T13:15:11Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627392#M11879</link>
      <description>&lt;P&gt;[0] MPI startup(): I_MPI_ROOT=/opt/hpc/software/mpi/intelmpi/2021.6.0&lt;BR /&gt;[0] MPI startup(): I_MPI_MPIRUN=mpirun&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=slurm&lt;BR /&gt;[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default&lt;BR /&gt;[0] MPI startup(): I_MPI_DEBUG=10&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;# Intel(R) MPI Benchmarks 2021.4, MPI-1 part&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;# Date : Fri Aug 30 12:42:54 2024&lt;BR /&gt;# Machine : x86_64&lt;BR /&gt;# System : Linux&lt;BR /&gt;# Release : 4.18.0-305.3.1.el8.x86_64&lt;BR /&gt;# Version : #1 SMP Tue Jun 1 16:14:33 UTC 2021&lt;BR /&gt;# MPI Version : 3.1&lt;BR /&gt;# MPI Thread Environment:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# Calling sequence was:&lt;/P&gt;&lt;P&gt;# imb/IMB-MPI1 Allreduce -npmin 5400 -off_cache 60,64&lt;/P&gt;&lt;P&gt;# Minimum message length in bytes: 0&lt;BR /&gt;# Maximum message length in bytes: 4194304&lt;BR /&gt;#&lt;BR /&gt;# MPI_Datatype : MPI_BYTE&lt;BR /&gt;# MPI_Datatype for reductions : MPI_FLOAT&lt;BR /&gt;# MPI_Op : MPI_SUM&lt;BR /&gt;#&lt;BR /&gt;#&lt;/P&gt;&lt;P&gt;# List of Benchmarks to run:&lt;/P&gt;&lt;P&gt;# Allreduce&lt;/P&gt;&lt;P&gt;#----------------------------------------------------------------&lt;BR /&gt;# Benchmarking Allreduce&lt;BR /&gt;# #processes = 5400&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]&lt;BR /&gt;0 1000 0.02 0.09 0.03&lt;BR /&gt;4 1000 750.44 773.20 761.29&lt;BR /&gt;8 1000 797.67 843.62 826.09&lt;BR /&gt;16 1000 828.77 858.66 846.15&lt;BR /&gt;32 1000 814.64 858.83 841.92&lt;BR /&gt;64 1000 823.91 914.41 838.27&lt;BR /&gt;128 1000 850.97 944.32 871.04&lt;BR /&gt;256 1000 843.54 933.99 858.84&lt;BR /&gt;512 1000 844.91 945.35 867.06&lt;BR /&gt;1024 1000 845.97 938.68 860.53&lt;BR /&gt;2048 1000 855.51 956.05 869.80&lt;BR /&gt;4096 1000 862.21 954.42 874.54&lt;BR /&gt;8192 1000 855.78 951.25 874.08&lt;BR /&gt;16384 1000 994.88 1098.12 1022.46&lt;BR /&gt;32768 1000 1036.75 1149.22 1055.48&lt;BR /&gt;65536 640 1072.59 1180.20 1102.77&lt;BR /&gt;131072 320 1855.89 1943.43 1904.21&lt;BR /&gt;262144 160 1910.85 2016.92 1956.57&lt;BR /&gt;524288 80 2612.14 2770.10 2683.22&lt;BR /&gt;1048576 40 2359.23 2660.41 2484.20&lt;BR /&gt;2097152 20 3038.83 3444.00 3173.75&lt;BR /&gt;4194304 10 4575.82 5294.64 4807.26&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# All processes entering MPI_Finalize&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 13:15:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627392#M11879</guid>
      <dc:creator>MetMan</dc:creator>
      <dc:date>2024-08-30T13:15:33Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627395#M11880</link>
      <description>&lt;P&gt;[0] MPI startup(): I_MPI_ROOT=/opt/hpc/software/mpi/intelmpi/2021.6.0&lt;BR /&gt;[0] MPI startup(): I_MPI_MPIRUN=mpirun&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=slurm&lt;BR /&gt;[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default&lt;BR /&gt;[0] MPI startup(): I_MPI_DEBUG=10&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;# Intel(R) MPI Benchmarks 2021.4, MPI-1 part&lt;BR /&gt;#----------------------------------------------------------------&lt;BR /&gt;# Date : Fri Aug 30 12:42:54 2024&lt;BR /&gt;# Machine : x86_64&lt;BR /&gt;# System : Linux&lt;BR /&gt;# Release : 4.18.0-305.3.1.el8.x86_64&lt;BR /&gt;# Version : #1 SMP Tue Jun 1 16:14:33 UTC 2021&lt;BR /&gt;# MPI Version : 3.1&lt;BR /&gt;# MPI Thread Environment:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;# Calling sequence was:&lt;/P&gt;&lt;P&gt;# imb/IMB-MPI1 Allreduce -npmin 5400 -off_cache 60,64&lt;/P&gt;&lt;P&gt;# Minimum message length in bytes: 0&lt;BR /&gt;# Maximum message length in bytes: 4194304&lt;BR /&gt;#&lt;BR /&gt;# MPI_Datatype : MPI_BYTE&lt;BR /&gt;# MPI_Datatype for reductions : MPI_FLOAT&lt;BR /&gt;# MPI_Op : MPI_SUM&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 13:17:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627395#M11880</guid>
      <dc:creator>MetMan</dc:creator>
      <dc:date>2024-08-30T13:17:32Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627402#M11881</link>
      <description>&lt;P&gt;As I mentioned above, please try with the latest release version which is 2021.13.1 oneAPI 2024.2.1. Also make sure that your MLX stack is up to date with the latest LTS version installed.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 13:57:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627402#M11881</guid>
      <dc:creator>TobiasK</dc:creator>
      <dc:date>2024-08-30T13:57:19Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627412#M11884</link>
      <description>&lt;P&gt;Thank you, TobiasK.&lt;/P&gt;&lt;P&gt;I will try the latest IMPI version.&lt;/P&gt;&lt;P&gt;MLX stack installation may be need root permission which i don't own.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 14:21:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1627412#M11884</guid>
      <dc:creator>MetMan</dc:creator>
      <dc:date>2024-08-30T14:21:38Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI Allreduce Scalability Problem</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1633522#M11923</link>
      <description>&lt;P&gt;Hi TobiasK,&lt;/P&gt;&lt;P&gt;Sorry for taking so long to reply. I tried the latest Intel MPI version 2021.13 and found that the difference in ALLREDUCE time between using 48 and 60 cores per node is still significant.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MetMan_0-1727319966743.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/59222i2B8B2F0FC082B42F/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="MetMan_0-1727319966743.png" alt="MetMan_0-1727319966743.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Sep 2024 03:06:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-Allreduce-Scalability-Problem/m-p/1633522#M11923</guid>
      <dc:creator>MetMan</dc:creator>
      <dc:date>2024-09-26T03:06:18Z</dc:date>
    </item>
  </channel>
</rss>

