<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Re:3rd gen Xeon showed slower performance with intel MPI library in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1527122#M10955</link>
    <description>&lt;P&gt;There is one typo "show but perfomance degradation" should be "which showed performance degradation".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;By the way, is there any update?&lt;/P&gt;</description>
    <pubDate>Mon, 25 Sep 2023 00:40:30 GMT</pubDate>
    <dc:creator>Kuni</dc:creator>
    <dc:date>2023-09-25T00:40:30Z</dc:date>
    <item>
      <title>3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1439615#M10150</link>
      <description>&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Now we are studying network traffic of HPC use. For this, we are using Intel MPI Library (latest - Intel HPC kit at 12/10/2022) and Nas Parallel Benchmark (3.4.2). Before measuring network traffic, I measured the performance without using network traffic.&amp;nbsp; We used following platform:&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine1. Xeon Silver 4310 server 8ch 64GB RAM, Hyper thread on, CentOS 7.9, Turbo ON&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 2. Xeon Silver 4214 server 6ch 96GB RAM Hyper thread on CentOS 7.9, no Turbo&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 3.&amp;nbsp; 4 core 8GB RAM virtual machine on machine 1. CentOS 7.9&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 4.&amp;nbsp; 4 core 8GB RAM vitual machine on machine 2. CentOS 7.9&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Results:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Test 1.&amp;nbsp; mpirun -n 4 ./bin/bt.B.x (4 process smaller array - 102 x 102 x 102)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 1.&amp;nbsp; 49.87 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 2. 62.02 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 3. 43.92 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 4. 63.11 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Test 2. mpirun -n 4 ./bin/bt.C.x&amp;nbsp; (4 process larger array - 162 x 162 x 162)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 1. 388.57 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 2. 253.40 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 3. 201.79 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 4. 256.78 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;In case of the above test 1, the result was understandable and performance diffrence was not strange and expected results were shown.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;However, 2nd test. I saw very strange results. There is two unexped things.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;1. Newer (3rd) generation of Xeon showed much slower result than older&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;(2nd) generation of Xeon on real machine.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;2. Newer (3rd) generation of Xeon showed big improvement , if the benchmark was executed on the virtual machine.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;In case of the memory of the machine 1 and the machine 2, machine 2's memory is 1/3 x bigger, however, the using memory of the test 2 (bt.C.x) only consume 4GB (free command result), then it the memory size difference might not make such big effects to execution results.&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;I also executed the tests with openmpi 4.1 the following is the results:&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Test 1.&amp;nbsp; mpirun -np 4 ./bin/bt.B.x (4 process smaller array)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 1.&amp;nbsp; 52.31 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 2.&amp;nbsp; 61.73 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Test 2. mpirun -np 4 ./bin/bt.C.x&amp;nbsp; (4 process larger array)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 1. 198.70 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;machine 2. 252.31 sec&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;Then it seems that Intel MPI and 3rd Gen Xeon and some large array treatment may cause performance down.&amp;nbsp; Then it seems that I can not use Intel MPI with&amp;nbsp; 3rd Gen Xeon. But Intel MPI is much easier to specify fabric and then I want to use it our network traffic evaluation if possible.&amp;nbsp; Then, I want to know following things to use Intel MPI library:&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;1. Why 3rd Gen Xeon showed slow performance? Why it was not shown with my vitrual machine case even with 3rd Gen Xeon?&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;2. Why the performance down is shown with Intel MPI library?&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;3. Is there any way to make performance up with Intel MPI and 3rd Gen Xeon?&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;Please help!.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;SPAN class="sub_section_element_selectors"&gt;K. Kunita&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 02:25:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1439615#M10150</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2022-12-19T02:25:15Z</dc:date>
    </item>
    <item>
      <title>Re:3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1439651#M10151</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for posting in the Intel forums.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please provide us with the following details which would help us in further investigation of your issue?&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;What is the&amp;nbsp;&lt;A href="https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/running-applications/job-schedulers-support.html" rel="noopener noreferrer" target="_blank"&gt;job scheduler&lt;/A&gt;&amp;nbsp;you are using?&lt;/LI&gt;&lt;LI&gt;What is the&amp;nbsp;&lt;A href="https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/running-applications/fabrics-control/ofi-providers-support.html" rel="noopener noreferrer" target="_blank"&gt;FI_PROVIDER&lt;/A&gt;(mlx/psm2/verbs etc..) you are using?&lt;/LI&gt;&lt;LI&gt;What is the Interconnect hardware(Infiniband/Intel Omni-Path etc..) you are using?&lt;/LI&gt;&lt;LI&gt;What is the Intel MPI version you are using?&lt;/LI&gt;&lt;LI&gt;Also, please provide us the &lt;STRONG&gt;sample reproducer code&lt;/STRONG&gt; to reproduce the issue from our end.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Santosh&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 19 Dec 2022 06:51:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1439651#M10151</guid>
      <dc:creator>SantoshY_Intel</dc:creator>
      <dc:date>2022-12-19T06:51:25Z</dc:date>
    </item>
    <item>
      <title>Re: Re:3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1439660#M10154</link>
      <description>&lt;P&gt;mpiI did not use job scheduler. I just run command "mpirun -np 4 ./bin/bt.C.x" or "mpirun -n 4 ./bin/bt.B.x".&amp;nbsp;&lt;/P&gt;
&lt;P&gt;At this time, issue happend without any node to node communication. I just use one server. Then communication might be loop back socket or Shared memory base.&amp;nbsp; The FI_PROVIDER might not make effect.&amp;nbsp; For the reference, I used command "mpirun -n 4 -genv FI_PROVIDER tcp ./bin/bt.X.x" (X is C or B).&amp;nbsp; The result is same.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;mpirun version is latest intel hpc-kit&amp;nbsp;&lt;/P&gt;
&lt;P&gt;$ mpirun --version&lt;BR /&gt;Intel(R) MPI Library for Linux* OS, Version 2021.7 Build 20221022 (id: f7b29a2495)&lt;BR /&gt;Copyright 2003-2022, Intel Corporation.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To reproduce your side,&amp;nbsp; follwing procedure can be used:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;On CentOS 7.9,&lt;/P&gt;
&lt;P&gt;# su -&lt;/P&gt;
&lt;P&gt;# yum update&lt;/P&gt;
&lt;P&gt;- install intel-basekit, intel-hpckit based on intel instruction. Basically, set repository for one api and then&lt;/P&gt;
&lt;P&gt;# yum install intel-basekit&lt;/P&gt;
&lt;P&gt;# yum install intel-hpckit&lt;/P&gt;
&lt;P&gt;# exit&lt;/P&gt;
&lt;P&gt;- downlaod nas 4.3.2 software&amp;nbsp;&lt;/P&gt;
&lt;P&gt;$ .&amp;nbsp; /opt/intel/oneapi/setvars.sh&lt;/P&gt;
&lt;P&gt;$ wget &lt;A href="https://www.nas.nasa.gov/assets/npb/NPB3.4.2.tar.gz" target="_blank"&gt;https://www.nas.nasa.gov/assets/npb/NPB3.4.2.tar.gz&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;$ sudo yum install centos-release-scl&lt;/P&gt;
&lt;P&gt;$ sudo yum install devtoolset-9&lt;/P&gt;
&lt;P&gt;$ scl enable devtoolset-9 bash&lt;/P&gt;
&lt;P&gt;$ cd npb/NPB3.4.2/NPB3.4-MPI&lt;/P&gt;
&lt;P&gt;$ cp config/make.def.template config/make.def&lt;/P&gt;
&lt;P&gt;$ cp config/suite.def.template config/suite.def&lt;/P&gt;
&lt;P&gt;$ vim config/make.def&lt;/P&gt;
&lt;P&gt;change to the followings:&lt;/P&gt;
&lt;P&gt;MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpif90&lt;BR /&gt;FMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi&lt;BR /&gt;FMPI_INC = -I/opt/intel/oneapi/mpi/latest/include&lt;BR /&gt;MPICC = /opt/intel/oneapi/mpi/latest/bin/mpicc&lt;BR /&gt;CMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi&lt;BR /&gt;CMPI_INC = -I/opt/intel/oneapi/mpi/latest/include&lt;/P&gt;
&lt;P&gt;$ vim config/suite.def&lt;/P&gt;
&lt;P&gt;delete all non comment lines and add following&lt;/P&gt;
&lt;P&gt;bt&amp;lt;tab&amp;gt;B&lt;/P&gt;
&lt;P&gt;bt&amp;lt;tab&amp;gt;C&lt;/P&gt;
&lt;P&gt;$ make suite&lt;/P&gt;
&lt;P&gt;$ mpirun -n 4 ./bin/bt.B.x&lt;/P&gt;
&lt;P&gt;$ mpirun -n 4 ./bin/bt.C.x&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For Virtual machine, you can create virtual machine, with OS standard way.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 07:46:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1439660#M10154</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2022-12-19T07:46:08Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440061#M10166</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for providing all the requested details.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please provide the outputs for the below &lt;STRONG&gt;commands&lt;/STRONG&gt; after &lt;A href="https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html#use-the-setvars-script-with-linux-or-macos" target="_blank"&gt;initializing the Intel oneAPI environment&lt;/A&gt;:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;fi_info -l
ibv_devinfo
lspci | grep Mellanox
lspci | grep Omni-Path&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, please provide the complete debug log for the command below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;I_MPI_DEBUG=30 mpirun -n 4 ./bin/bt.B.x&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;
&lt;P&gt;Santosh&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;
&lt;P&gt;Santosh&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Dec 2022 09:58:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440061#M10166</guid>
      <dc:creator>SantoshY_Intel</dc:creator>
      <dc:date>2022-12-20T09:58:09Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440299#M10170</link>
      <description>&lt;P&gt;Hi Santosh,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for your quick response.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Following is the screen output of the requested commands.&amp;nbsp; It is executed on machine 1 (3rd Gen Xeon scallable Processor).&lt;/P&gt;
&lt;P&gt;If you want to show same things on the other machine, please ask me.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;$ fi_info -l&lt;BR /&gt;psm2:&lt;BR /&gt;version: 113.20&lt;BR /&gt;mlx:&lt;BR /&gt;version: 1.4&lt;BR /&gt;psm3:&lt;BR /&gt;version: 1103.0&lt;BR /&gt;psm3:&lt;BR /&gt;version: 1102.0&lt;BR /&gt;ofi_rxm:&lt;BR /&gt;version: 113.20&lt;BR /&gt;verbs:&lt;BR /&gt;version: 113.20&lt;BR /&gt;verbs:&lt;BR /&gt;version: 113.20&lt;BR /&gt;tcp:&lt;BR /&gt;version: 113.20&lt;BR /&gt;sockets:&lt;BR /&gt;version: 113.20&lt;BR /&gt;shm:&lt;BR /&gt;version: 114.0&lt;BR /&gt;ofi_hook_noop:&lt;BR /&gt;version: 113.20&lt;/P&gt;
&lt;P&gt;ibv_devinfo&lt;BR /&gt;hca_id: rdmap24s0f0&lt;BR /&gt;transport: InfiniBand (0)&lt;BR /&gt;fw_ver: 1.60&lt;BR /&gt;node_guid: 669d:99ff:feff:ff5e&lt;BR /&gt;sys_image_guid: 649d:99ff:ff5e:0000&lt;BR /&gt;vendor_id: 0x8086&lt;BR /&gt;vendor_part_id: 5522&lt;BR /&gt;hw_ver: 0x2&lt;BR /&gt;phys_port_cnt: 1&lt;BR /&gt;port: 1&lt;BR /&gt;state: PORT_ACTIVE (4)&lt;BR /&gt;max_mtu: 4096 (5)&lt;BR /&gt;active_mtu: 1024 (3)&lt;BR /&gt;sm_lid: 0&lt;BR /&gt;port_lid: 1&lt;BR /&gt;port_lmc: 0x00&lt;BR /&gt;link_layer: Ethernet&lt;/P&gt;
&lt;P&gt;hca_id: irdma1&lt;BR /&gt;transport: InfiniBand (0)&lt;BR /&gt;fw_ver: 1.60&lt;BR /&gt;node_guid: 669d:99ff:feff:ff5f&lt;BR /&gt;sys_image_guid: 649d:99ff:ff5f:0000&lt;BR /&gt;vendor_id: 0x8086&lt;BR /&gt;vendor_part_id: 5522&lt;BR /&gt;hw_ver: 0x2&lt;BR /&gt;phys_port_cnt: 1&lt;BR /&gt;port: 1&lt;BR /&gt;state: PORT_DOWN (1)&lt;BR /&gt;max_mtu: 4096 (5)&lt;BR /&gt;active_mtu: 1024 (3)&lt;BR /&gt;sm_lid: 0&lt;BR /&gt;port_lid: 1&lt;BR /&gt;port_lmc: 0x00&lt;BR /&gt;link_layer: Ethernet&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[kkunita@svr4 NPB3.4-MPI]$ lspci | grep Mellanox&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;[kkunita@svr4 NPB3.4-MPI]$ lspci | grep Omini-Path&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;$ I_MPI_DEBUG=30 mpirun -n 4 ./bin/bt.B.x&lt;BR /&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.7 Build 20221022 (id: f7b29a2495)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total&lt;BR /&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ze_hmem_dl_init():422&amp;lt;warn&amp;gt; Failed to dlopen libze_loader.so&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():214&amp;lt;warn&amp;gt; Failed to initialize hmem iface FI_HMEM_ZE: No data available&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: verbs (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: verbs (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: tcp (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: sockets (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ZE not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: shm (114.0)&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ze_hmem_dl_init():422&amp;lt;warn&amp;gt; Failed to dlopen libze_loader.so&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():214&amp;lt;warn&amp;gt; Failed to initialize hmem iface FI_HMEM_ZE: No data available&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: ofi_rxm (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: psm2 (113.20)&lt;BR /&gt;libfabric:29820:psm3:core:fi_prov_ini():752&amp;lt;info&amp;gt; build options: VERSION=1102.0=11.2.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: psm3 (1102.0)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: mlx (1.4)&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ZE not supported&lt;BR /&gt;libfabric:29820:psm3:core:fi_prov_ini():785&amp;lt;info&amp;gt; build options: VERSION=1103.0=11.3.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: psm3 (1103.0)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: ofi_hook_noop (113.20)&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ze_hmem_dl_init():422&amp;lt;warn&amp;gt; Failed to dlopen libze_loader.so&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():214&amp;lt;warn&amp;gt; Failed to initialize hmem iface FI_HMEM_ZE: No data available&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;[0] MPI startup(): max_ch4_vnis: 1, max_reg_eps 64, enable_sep 0, enable_shared_ctxs 0, do_av_insert 0&lt;BR /&gt;[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;[0] MPI startup(): libfabric provider: psm3&lt;BR /&gt;[0] MPI startup(): detected psm3 provider, set device name to "psm3"&lt;BR /&gt;libfabric:29820:core:core:fi_fabric_():1423&amp;lt;info&amp;gt; Opened fabric: RoCE-192.168.17.0/24&lt;BR /&gt;libfabric:29820:core:core:ofi_shm_map():171&amp;lt;warn&amp;gt; shm_open failed&lt;BR /&gt;[0] MPI startup(): addrnamelen: 32&lt;BR /&gt;libfabric:29820:core:core:ofi_ns_add_local_name():370&amp;lt;warn&amp;gt; Cannot add local name - name server uninitialized&lt;BR /&gt;[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3_100.dat" not found&lt;BR /&gt;[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3.dat"&lt;BR /&gt;[0] MPI startup(): threading: mode: direct&lt;BR /&gt;[0] MPI startup(): threading: vcis: 1&lt;BR /&gt;[0] MPI startup(): threading: app_threads: -1&lt;BR /&gt;[0] MPI startup(): threading: runtime: generic&lt;BR /&gt;[0] MPI startup(): threading: progress_threads: 0&lt;BR /&gt;[0] MPI startup(): threading: async_progress: 0&lt;BR /&gt;[0] MPI startup(): threading: lock_level: global&lt;BR /&gt;[0] MPI startup(): threading: num_pools: 1&lt;BR /&gt;[0] MPI startup(): threading: enable_sep: 0&lt;BR /&gt;[0] MPI startup(): threading: direct_recv: 1&lt;BR /&gt;[0] MPI startup(): threading: zero_op_flags: 0&lt;BR /&gt;[0] MPI startup(): threading: num_am_buffers: 1&lt;BR /&gt;[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823) &lt;BR /&gt;[0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823) &lt;BR /&gt;[0] MPI startup(): Rank Pid Node name Pin cpu&lt;BR /&gt;[0] MPI startup(): 0 29820 svr4 {0,1,2,12,13,14}&lt;BR /&gt;[0] MPI startup(): 1 29821 svr4 {3,4,5,15,16,17}&lt;BR /&gt;[0] MPI startup(): 2 29822 svr4 {6,7,8,18,19,20}&lt;BR /&gt;[0] MPI startup(): 3 29823 svr4 {9,10,11,21,22,23}&lt;BR /&gt;[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.7.1&lt;BR /&gt;[0] MPI startup(): I_MPI_MPIRUN=mpirun&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc&lt;BR /&gt;[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default&lt;BR /&gt;[0] MPI startup(): I_MPI_DEBUG=30&lt;BR /&gt;[0] allocate handle (kind=1, size=744, direct_size=8, indirect_size=1) ptr=0x7f2002f57d80&lt;BR /&gt;[0] allocate handle (kind=2, size=40, direct_size=8, indirect_size=1) ptr=0x7f10000d5900&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;NAS Parallel Benchmarks 3.4 -- BT Benchmark&lt;/P&gt;
&lt;P&gt;No input file inputbt.data. Using compiled defaults&lt;BR /&gt;Size: 102x 102x 102 (class B)&lt;BR /&gt;Iterations: 200 dt: 0.0003000&lt;BR /&gt;Total number of processes: 4&lt;/P&gt;
&lt;P&gt;Time step 1&lt;BR /&gt;Time step 20&lt;BR /&gt;Time step 40&lt;BR /&gt;Time step 60&lt;BR /&gt;Time step 80&lt;BR /&gt;Time step 100&lt;BR /&gt;Time step 120&lt;BR /&gt;Time step 140&lt;BR /&gt;Time step 160&lt;BR /&gt;Time step 180&lt;BR /&gt;Time step 200&lt;BR /&gt;Verification being performed for class B&lt;BR /&gt;accuracy setting for epsilon = 0.1000000000000E-07&lt;BR /&gt;Comparison of RMS-norms of residual&lt;BR /&gt;1 0.1423359722929E+04 0.1423359722929E+04 0.1070287152945E-13&lt;BR /&gt;2 0.9933052259015E+02 0.9933052259015E+02 0.7153317200312E-15&lt;BR /&gt;3 0.3564602564454E+03 0.3564602564454E+03 0.5900255245348E-14&lt;BR /&gt;4 0.3248544795908E+03 0.3248544795908E+03 0.9798945854817E-14&lt;BR /&gt;5 0.3270754125466E+04 0.3270754125466E+04 0.1223502756335E-13&lt;BR /&gt;Comparison of RMS-norms of solution error&lt;BR /&gt;1 0.5296984714094E+02 0.5296984714094E+02 0.9389868800427E-15&lt;BR /&gt;2 0.4463289611567E+01 0.4463289611567E+01 0.1293476388601E-13&lt;BR /&gt;3 0.1312257334221E+02 0.1312257334221E+02 0.1258908460682E-13&lt;BR /&gt;4 0.1200692532356E+02 0.1200692532356E+02 0.6805440394643E-14&lt;BR /&gt;5 0.1245957615104E+03 0.1245957615104E+03 0.1003690013030E-13&lt;BR /&gt;Verification Successful&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;BT Benchmark Completed.&lt;BR /&gt;Class = B&lt;BR /&gt;Size = 102x 102x 102&lt;BR /&gt;Iterations = 200&lt;BR /&gt;Time in seconds = 52.97&lt;BR /&gt;Total processes = 4&lt;BR /&gt;Active processes= 4&lt;BR /&gt;Mop/s total = 13256.00&lt;BR /&gt;Mop/s/process = 3314.00&lt;BR /&gt;Operation type = floating point&lt;BR /&gt;Verification = SUCCESSFUL&lt;BR /&gt;Version = 3.4.2&lt;BR /&gt;Compile date = 22 Sep 2022&lt;/P&gt;
&lt;P&gt;Compile options:&lt;BR /&gt;MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpif90&lt;BR /&gt;FLINK = $(MPIFC)&lt;BR /&gt;FMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi&lt;BR /&gt;FMPI_INC = -I/opt/intel/oneapi/mpi/latest/include&lt;BR /&gt;FFLAGS = -O3&lt;BR /&gt;FLINKFLAGS = $(FFLAGS)&lt;BR /&gt;RAND = (none)&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Please send feedbacks and/or the results of this run to:&lt;/P&gt;
&lt;P&gt;NPB Development Team &lt;BR /&gt;Internet: npb@nas.nasa.gov&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards, K. Kunita&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2022 00:29:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440299#M10170</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2022-12-21T00:29:41Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440314#M10171</link>
      <description>&lt;P class="sub_section_element_selectors"&gt;Hi Santosh,&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;Thank you for your quick response.&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;Following is the screen output of the requested commands.&amp;nbsp; It is executed on machine 1 (3rd Gen Xeon scallable Processor).&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;If you want to show same things on the other machine, please ask me.&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;$ fi_info -l&lt;BR /&gt;psm2:&lt;BR /&gt;version: 113.20&lt;BR /&gt;mlx:&lt;BR /&gt;version: 1.4&lt;BR /&gt;psm3:&lt;BR /&gt;version: 1103.0&lt;BR /&gt;psm3:&lt;BR /&gt;version: 1102.0&lt;BR /&gt;ofi_rxm:&lt;BR /&gt;version: 113.20&lt;BR /&gt;verbs:&lt;BR /&gt;version: 113.20&lt;BR /&gt;verbs:&lt;BR /&gt;version: 113.20&lt;BR /&gt;tcp:&lt;BR /&gt;version: 113.20&lt;BR /&gt;sockets:&lt;BR /&gt;version: 113.20&lt;BR /&gt;shm:&lt;BR /&gt;version: 114.0&lt;BR /&gt;ofi_hook_noop:&lt;BR /&gt;version: 113.20&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;ibv_devinfo&lt;BR /&gt;hca_id: rdmap24s0f0&lt;BR /&gt;transport: InfiniBand (0)&lt;BR /&gt;fw_ver: 1.60&lt;BR /&gt;node_guid: 669d:99ff:feff:ff5e&lt;BR /&gt;sys_image_guid: 649d:99ff:ff5e:0000&lt;BR /&gt;vendor_id: 0x8086&lt;BR /&gt;vendor_part_id: 5522&lt;BR /&gt;hw_ver: 0x2&lt;BR /&gt;phys_port_cnt: 1&lt;BR /&gt;port: 1&lt;BR /&gt;state: PORT_ACTIVE (4)&lt;BR /&gt;max_mtu: 4096 (5)&lt;BR /&gt;active_mtu: 1024 (3)&lt;BR /&gt;sm_lid: 0&lt;BR /&gt;port_lid: 1&lt;BR /&gt;port_lmc: 0x00&lt;BR /&gt;link_layer: Ethernet&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;hca_id: irdma1&lt;BR /&gt;transport: InfiniBand (0)&lt;BR /&gt;fw_ver: 1.60&lt;BR /&gt;node_guid: 669d:99ff:feff:ff5f&lt;BR /&gt;sys_image_guid: 649d:99ff:ff5f:0000&lt;BR /&gt;vendor_id: 0x8086&lt;BR /&gt;vendor_part_id: 5522&lt;BR /&gt;hw_ver: 0x2&lt;BR /&gt;phys_port_cnt: 1&lt;BR /&gt;port: 1&lt;BR /&gt;state: PORT_DOWN (1)&lt;BR /&gt;max_mtu: 4096 (5)&lt;BR /&gt;active_mtu: 1024 (3)&lt;BR /&gt;sm_lid: 0&lt;BR /&gt;port_lid: 1&lt;BR /&gt;port_lmc: 0x00&lt;BR /&gt;link_layer: Ethernet&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;[kkunita@svr4 NPB3.4-MPI]$ lspci | grep Mellanox&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;BR /&gt;[kkunita@svr4 NPB3.4-MPI]$ lspci | grep Omini-Path&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;$ I_MPI_DEBUG=30 mpirun -n 4 ./bin/bt.B.x&lt;BR /&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.7 Build 20221022 (id: f7b29a2495)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total&lt;BR /&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ze_hmem_dl_init():422&amp;lt;warn&amp;gt; Failed to dlopen libze_loader.so&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():214&amp;lt;warn&amp;gt; Failed to initialize hmem iface FI_HMEM_ZE: No data available&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: verbs (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: verbs (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: tcp (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: sockets (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ZE not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: shm (114.0)&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ze_hmem_dl_init():422&amp;lt;warn&amp;gt; Failed to dlopen libze_loader.so&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():214&amp;lt;warn&amp;gt; Failed to initialize hmem iface FI_HMEM_ZE: No data available&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: ofi_rxm (113.20)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: psm2 (113.20)&lt;BR /&gt;libfabric:29820:psm3:core:fi_prov_ini():752&amp;lt;info&amp;gt; build options: VERSION=1102.0=11.2.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: psm3 (1102.0)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: mlx (1.4)&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ZE not supported&lt;BR /&gt;libfabric:29820:psm3:core:fi_prov_ini():785&amp;lt;info&amp;gt; build options: VERSION=1103.0=11.3.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: psm3 (1103.0)&lt;BR /&gt;libfabric:29820:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: ofi_hook_noop (113.20)&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:29820:core:core:ze_hmem_dl_init():422&amp;lt;warn&amp;gt; Failed to dlopen libze_loader.so&lt;BR /&gt;libfabric:29820:core:core:ofi_hmem_init():214&amp;lt;warn&amp;gt; Failed to initialize hmem iface FI_HMEM_ZE: No data available&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:29820:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;[0] MPI startup(): max_ch4_vnis: 1, max_reg_eps 64, enable_sep 0, enable_shared_ctxs 0, do_av_insert 0&lt;BR /&gt;[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:29820:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;[0] MPI startup(): libfabric provider: psm3&lt;BR /&gt;[0] MPI startup(): detected psm3 provider, set device name to "psm3"&lt;BR /&gt;libfabric:29820:core:core:fi_fabric_():1423&amp;lt;info&amp;gt; Opened fabric: RoCE-192.168.17.0/24&lt;BR /&gt;libfabric:29820:core:core:ofi_shm_map():171&amp;lt;warn&amp;gt; shm_open failed&lt;BR /&gt;[0] MPI startup(): addrnamelen: 32&lt;BR /&gt;libfabric:29820:core:core:ofi_ns_add_local_name():370&amp;lt;warn&amp;gt; Cannot add local name - name server uninitialized&lt;BR /&gt;[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3_100.dat" not found&lt;BR /&gt;[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3.dat"&lt;BR /&gt;[0] MPI startup(): threading: mode: direct&lt;BR /&gt;[0] MPI startup(): threading: vcis: 1&lt;BR /&gt;[0] MPI startup(): threading: app_threads: -1&lt;BR /&gt;[0] MPI startup(): threading: runtime: generic&lt;BR /&gt;[0] MPI startup(): threading: progress_threads: 0&lt;BR /&gt;[0] MPI startup(): threading: async_progress: 0&lt;BR /&gt;[0] MPI startup(): threading: lock_level: global&lt;BR /&gt;[0] MPI startup(): threading: num_pools: 1&lt;BR /&gt;[0] MPI startup(): threading: enable_sep: 0&lt;BR /&gt;[0] MPI startup(): threading: direct_recv: 1&lt;BR /&gt;[0] MPI startup(): threading: zero_op_flags: 0&lt;BR /&gt;[0] MPI startup(): threading: num_am_buffers: 1&lt;BR /&gt;[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823)&lt;BR /&gt;[0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823)&lt;BR /&gt;[0] MPI startup(): Rank Pid Node name Pin cpu&lt;BR /&gt;[0] MPI startup(): 0 29820 svr4 {0,1,2,12,13,14}&lt;BR /&gt;[0] MPI startup(): 1 29821 svr4 {3,4,5,15,16,17}&lt;BR /&gt;[0] MPI startup(): 2 29822 svr4 {6,7,8,18,19,20}&lt;BR /&gt;[0] MPI startup(): 3 29823 svr4 {9,10,11,21,22,23}&lt;BR /&gt;[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.7.1&lt;BR /&gt;[0] MPI startup(): I_MPI_MPIRUN=mpirun&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc&lt;BR /&gt;[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default&lt;BR /&gt;[0] MPI startup(): I_MPI_DEBUG=30&lt;BR /&gt;[0] allocate handle (kind=1, size=744, direct_size=8, indirect_size=1) ptr=0x7f2002f57d80&lt;BR /&gt;[0] allocate handle (kind=2, size=40, direct_size=8, indirect_size=1) ptr=0x7f10000d5900&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;BR /&gt;NAS Parallel Benchmarks 3.4 -- BT Benchmark&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;No input file inputbt.data. Using compiled defaults&lt;BR /&gt;Size: 102x 102x 102 (class B)&lt;BR /&gt;Iterations: 200 dt: 0.0003000&lt;BR /&gt;Total number of processes: 4&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;Time step 1&lt;BR /&gt;Time step 20&lt;BR /&gt;Time step 40&lt;BR /&gt;Time step 60&lt;BR /&gt;Time step 80&lt;BR /&gt;Time step 100&lt;BR /&gt;Time step 120&lt;BR /&gt;Time step 140&lt;BR /&gt;Time step 160&lt;BR /&gt;Time step 180&lt;BR /&gt;Time step 200&lt;BR /&gt;Verification being performed for class B&lt;BR /&gt;accuracy setting for epsilon = 0.1000000000000E-07&lt;BR /&gt;Comparison of RMS-norms of residual&lt;BR /&gt;1 0.1423359722929E+04 0.1423359722929E+04 0.1070287152945E-13&lt;BR /&gt;2 0.9933052259015E+02 0.9933052259015E+02 0.7153317200312E-15&lt;BR /&gt;3 0.3564602564454E+03 0.3564602564454E+03 0.5900255245348E-14&lt;BR /&gt;4 0.3248544795908E+03 0.3248544795908E+03 0.9798945854817E-14&lt;BR /&gt;5 0.3270754125466E+04 0.3270754125466E+04 0.1223502756335E-13&lt;BR /&gt;Comparison of RMS-norms of solution error&lt;BR /&gt;1 0.5296984714094E+02 0.5296984714094E+02 0.9389868800427E-15&lt;BR /&gt;2 0.4463289611567E+01 0.4463289611567E+01 0.1293476388601E-13&lt;BR /&gt;3 0.1312257334221E+02 0.1312257334221E+02 0.1258908460682E-13&lt;BR /&gt;4 0.1200692532356E+02 0.1200692532356E+02 0.6805440394643E-14&lt;BR /&gt;5 0.1245957615104E+03 0.1245957615104E+03 0.1003690013030E-13&lt;BR /&gt;Verification Successful&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;BR /&gt;BT Benchmark Completed.&lt;BR /&gt;Class = B&lt;BR /&gt;Size = 102x 102x 102&lt;BR /&gt;Iterations = 200&lt;BR /&gt;Time in seconds = 52.97&lt;BR /&gt;Total processes = 4&lt;BR /&gt;Active processes= 4&lt;BR /&gt;Mop/s total = 13256.00&lt;BR /&gt;Mop/s/process = 3314.00&lt;BR /&gt;Operation type = floating point&lt;BR /&gt;Verification = SUCCESSFUL&lt;BR /&gt;Version = 3.4.2&lt;BR /&gt;Compile date = 22 Sep 2022&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;Compile options:&lt;BR /&gt;MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpif90&lt;BR /&gt;FLINK = $(MPIFC)&lt;BR /&gt;FMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi&lt;BR /&gt;FMPI_INC = -I/opt/intel/oneapi/mpi/latest/include&lt;BR /&gt;FFLAGS = -O3&lt;BR /&gt;FLINKFLAGS = $(FFLAGS)&lt;BR /&gt;RAND = (none)&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&lt;BR /&gt;Please send feedbacks and/or the results of this run to:&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;NPB Development Team&lt;BR /&gt;Internet: npb@nas.nasa.gov&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;Regards, K. Kunita&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2022 02:09:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440314#M10171</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2022-12-21T02:09:31Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440401#M10172</link>
      <description>&lt;P&gt;Hi Santosh,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for your quick response.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The followings are what you requested:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[kkunita@svr4 NPB3.4-MPI]$ fi_info -l&lt;BR /&gt;psm2:&lt;BR /&gt;version: 113.20&lt;BR /&gt;mlx:&lt;BR /&gt;version: 1.4&lt;BR /&gt;psm3:&lt;BR /&gt;version: 1103.0&lt;BR /&gt;psm3:&lt;BR /&gt;version: 1102.0&lt;BR /&gt;ofi_rxm:&lt;BR /&gt;version: 113.20&lt;BR /&gt;verbs:&lt;BR /&gt;version: 113.20&lt;BR /&gt;verbs:&lt;BR /&gt;version: 113.20&lt;BR /&gt;tcp:&lt;BR /&gt;version: 113.20&lt;BR /&gt;sockets:&lt;BR /&gt;version: 113.20&lt;BR /&gt;shm:&lt;BR /&gt;version: 114.0&lt;BR /&gt;ofi_hook_noop:&lt;BR /&gt;version: 113.20&lt;BR /&gt;[kkunita@svr4 NPB3.4-MPI]$ ibv_devinfo&lt;BR /&gt;hca_id: rdmap24s0f0&lt;BR /&gt;transport: InfiniBand (0)&lt;BR /&gt;fw_ver: 1.60&lt;BR /&gt;node_guid: 669d:99ff:feff:ff5e&lt;BR /&gt;sys_image_guid: 649d:99ff:ff5e:0000&lt;BR /&gt;vendor_id: 0x8086&lt;BR /&gt;vendor_part_id: 5522&lt;BR /&gt;hw_ver: 0x2&lt;BR /&gt;phys_port_cnt: 1&lt;BR /&gt;port: 1&lt;BR /&gt;state: PORT_ACTIVE (4)&lt;BR /&gt;max_mtu: 4096 (5)&lt;BR /&gt;active_mtu: 1024 (3)&lt;BR /&gt;sm_lid: 0&lt;BR /&gt;port_lid: 1&lt;BR /&gt;port_lmc: 0x00&lt;BR /&gt;link_layer: Ethernet&lt;/P&gt;
&lt;P&gt;hca_id: irdma1&lt;BR /&gt;transport: InfiniBand (0)&lt;BR /&gt;fw_ver: 1.60&lt;BR /&gt;node_guid: 669d:99ff:feff:ff5f&lt;BR /&gt;sys_image_guid: 649d:99ff:ff5f:0000&lt;BR /&gt;vendor_id: 0x8086&lt;BR /&gt;vendor_part_id: 5522&lt;BR /&gt;hw_ver: 0x2&lt;BR /&gt;phys_port_cnt: 1&lt;BR /&gt;port: 1&lt;BR /&gt;state: PORT_DOWN (1)&lt;BR /&gt;max_mtu: 4096 (5)&lt;BR /&gt;active_mtu: 1024 (3)&lt;BR /&gt;sm_lid: 0&lt;BR /&gt;port_lid: 1&lt;BR /&gt;port_lmc: 0x00&lt;BR /&gt;link_layer: Ethernet&lt;/P&gt;
&lt;P&gt;[kkunita@svr4 NPB3.4-MPI]$ lspci |grep Mellanox&lt;BR /&gt;[kkunita@svr4 NPB3.4-MPI]$ lspci |grep Omni_Path&lt;BR /&gt;[kkunita@svr4 NPB3.4-MPI]$ I_MPI_DEBUG=30 mpirun -n 4 ./bin/bt.B.x&lt;BR /&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.7 Build 20221022 (id: f7b29a2495)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): shm segment size (342 MB per rank) * (4 local ranks) = 1368 MB total&lt;BR /&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:24286:core:core:ze_hmem_dl_init():422&amp;lt;warn&amp;gt; Failed to dlopen libze_loader.so&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():214&amp;lt;warn&amp;gt; Failed to initialize hmem iface FI_HMEM_ZE: No data available&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: verbs (113.20)&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: verbs (113.20)&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: tcp (113.20)&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: sockets (113.20)&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ZE not supported&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: shm (114.0)&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:24286:core:core:ze_hmem_dl_init():422&amp;lt;warn&amp;gt; Failed to dlopen libze_loader.so&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():214&amp;lt;warn&amp;gt; Failed to initialize hmem iface FI_HMEM_ZE: No data available&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: ofi_rxm (113.20)&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: psm2 (113.20)&lt;BR /&gt;libfabric:24286:psm3:core:fi_prov_ini():752&amp;lt;info&amp;gt; build options: VERSION=1102.0=11.2.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: psm3 (1102.0)&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: mlx (1.4)&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():222&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ZE not supported&lt;BR /&gt;libfabric:24286:psm3:core:fi_prov_ini():785&amp;lt;info&amp;gt; build options: VERSION=1103.0=11.3.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: psm3 (1103.0)&lt;BR /&gt;libfabric:24286:core:core:ofi_register_provider():474&amp;lt;info&amp;gt; registering provider: ofi_hook_noop (113.20)&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_CUDA not supported&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():209&amp;lt;info&amp;gt; Hmem iface FI_HMEM_ROCR not supported&lt;BR /&gt;libfabric:24286:core:core:ze_hmem_dl_init():422&amp;lt;warn&amp;gt; Failed to dlopen libze_loader.so&lt;BR /&gt;libfabric:24286:core:core:ofi_hmem_init():214&amp;lt;warn&amp;gt; Failed to initialize hmem iface FI_HMEM_ZE: No data available&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;psm3 layering&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1001&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;sockets layering&lt;BR /&gt;libfabric:24286:core:core:ofi_layering_ok():1007&amp;lt;info&amp;gt; Skipping util;shm layering&lt;BR /&gt;[0] MPI startup(): max_ch4_vnis: 1, max_reg_eps 64, enable_sep 0, enable_shared_ctxs 0, do_av_insert 0&lt;BR /&gt;[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1138&amp;lt;info&amp;gt; Found provider with the highest priority psm2, must_use_util_prov = 0&lt;BR /&gt;libfabric:24286:core:core:fi_getinfo_():1201&amp;lt;info&amp;gt; Start regular provider search because provider with the highest priority psm2 can not be initialized&lt;BR /&gt;[0] MPI startup(): libfabric provider: psm3&lt;BR /&gt;[0] MPI startup(): detected psm3 provider, set device name to "psm3"&lt;BR /&gt;libfabric:24286:core:core:fi_fabric_():1423&amp;lt;info&amp;gt; Opened fabric: RoCE-192.168.17.0/24&lt;BR /&gt;libfabric:24286:core:core:ofi_shm_map():171&amp;lt;warn&amp;gt; shm_open failed&lt;BR /&gt;libfabric:24286:core:core:ofi_ns_add_local_name():370&amp;lt;warn&amp;gt; Cannot add local name - name server uninitialized&lt;BR /&gt;[0] MPI startup(): addrnamelen: 32&lt;BR /&gt;[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3_100.dat" not found&lt;BR /&gt;[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.7.1/etc/tuning_icx_shm-ofi_psm3.dat"&lt;BR /&gt;[0] MPI startup(): threading: mode: direct&lt;BR /&gt;[0] MPI startup(): threading: vcis: 1&lt;BR /&gt;[0] MPI startup(): threading: app_threads: -1&lt;BR /&gt;[0] MPI startup(): threading: runtime: generic&lt;BR /&gt;[0] MPI startup(): threading: progress_threads: 0&lt;BR /&gt;[0] MPI startup(): threading: async_progress: 0&lt;BR /&gt;[0] MPI startup(): threading: lock_level: global&lt;BR /&gt;[0] MPI startup(): threading: num_pools: 1&lt;BR /&gt;[0] MPI startup(): threading: enable_sep: 0&lt;BR /&gt;[0] MPI startup(): threading: direct_recv: 1&lt;BR /&gt;[0] MPI startup(): threading: zero_op_flags: 0&lt;BR /&gt;[0] MPI startup(): threading: num_am_buffers: 1&lt;BR /&gt;[0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823) &lt;BR /&gt;[0] MPI startup(): source bits available: 30 (Maximal number of rank: 1073741823) &lt;BR /&gt;[0] MPI startup(): Rank Pid Node name Pin cpu&lt;BR /&gt;[0] MPI startup(): 0 24286 svr4 {0,1,2,12,13,14}&lt;BR /&gt;[0] MPI startup(): 1 24287 svr4 {3,4,5,15,16,17}&lt;BR /&gt;[0] MPI startup(): 2 24288 svr4 {6,7,8,18,19,20}&lt;BR /&gt;[0] MPI startup(): 3 24289 svr4 {9,10,11,21,22,23}&lt;BR /&gt;[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.7.1&lt;BR /&gt;[0] MPI startup(): I_MPI_MPIRUN=mpirun&lt;BR /&gt;[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc&lt;BR /&gt;[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default&lt;BR /&gt;[0] MPI startup(): I_MPI_DEBUG=30&lt;BR /&gt;[0] allocate handle (kind=1, size=744, direct_size=8, indirect_size=1) ptr=0x7f2002efe740&lt;BR /&gt;[0] allocate handle (kind=2, size=40, direct_size=8, indirect_size=1) ptr=0x7f100004c440&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;NAS Parallel Benchmarks 3.4 -- BT Benchmark&lt;/P&gt;
&lt;P&gt;No input file inputbt.data. Using compiled defaults&lt;BR /&gt;Size: 102x 102x 102 (class B)&lt;BR /&gt;Iterations: 200 dt: 0.0003000&lt;BR /&gt;Total number of processes: 4&lt;/P&gt;
&lt;P&gt;Time step 1&lt;BR /&gt;Time step 20&lt;BR /&gt;Time step 40&lt;BR /&gt;Time step 60&lt;BR /&gt;Time step 80&lt;BR /&gt;Time step 100&lt;BR /&gt;Time step 120&lt;BR /&gt;Time step 140&lt;BR /&gt;Time step 160&lt;BR /&gt;Time step 180&lt;BR /&gt;Time step 200&lt;BR /&gt;Verification being performed for class B&lt;BR /&gt;accuracy setting for epsilon = 0.1000000000000E-07&lt;BR /&gt;Comparison of RMS-norms of residual&lt;BR /&gt;1 0.1423359722929E+04 0.1423359722929E+04 0.1070287152945E-13&lt;BR /&gt;2 0.9933052259015E+02 0.9933052259015E+02 0.7153317200312E-15&lt;BR /&gt;3 0.3564602564454E+03 0.3564602564454E+03 0.5900255245348E-14&lt;BR /&gt;4 0.3248544795908E+03 0.3248544795908E+03 0.9798945854817E-14&lt;BR /&gt;5 0.3270754125466E+04 0.3270754125466E+04 0.1223502756335E-13&lt;BR /&gt;Comparison of RMS-norms of solution error&lt;BR /&gt;1 0.5296984714094E+02 0.5296984714094E+02 0.9389868800427E-15&lt;BR /&gt;2 0.4463289611567E+01 0.4463289611567E+01 0.1293476388601E-13&lt;BR /&gt;3 0.1312257334221E+02 0.1312257334221E+02 0.1258908460682E-13&lt;BR /&gt;4 0.1200692532356E+02 0.1200692532356E+02 0.6805440394643E-14&lt;BR /&gt;5 0.1245957615104E+03 0.1245957615104E+03 0.1003690013030E-13&lt;BR /&gt;Verification Successful&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;BT Benchmark Completed.&lt;BR /&gt;Class = B&lt;BR /&gt;Size = 102x 102x 102&lt;BR /&gt;Iterations = 200&lt;BR /&gt;Time in seconds = 52.59&lt;BR /&gt;Total processes = 4&lt;BR /&gt;Active processes= 4&lt;BR /&gt;Mop/s total = 13350.83&lt;BR /&gt;Mop/s/process = 3337.71&lt;BR /&gt;Operation type = floating point&lt;BR /&gt;Verification = SUCCESSFUL&lt;BR /&gt;Version = 3.4.2&lt;BR /&gt;Compile date = 22 Sep 2022&lt;/P&gt;
&lt;P&gt;Compile options:&lt;BR /&gt;MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpif90&lt;BR /&gt;FLINK = $(MPIFC)&lt;BR /&gt;FMPI_LIB = -L/opt/intel/oneapi/mpi/latest/lib -lmpi&lt;BR /&gt;FMPI_INC = -I/opt/intel/oneapi/mpi/latest/include&lt;BR /&gt;FFLAGS = -O3&lt;BR /&gt;FLINKFLAGS = $(FFLAGS)&lt;BR /&gt;RAND = (none)&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Please send feedbacks and/or the results of this run to:&lt;/P&gt;
&lt;P&gt;NPB Development Team &lt;BR /&gt;Internet: &lt;A href="mailto:npb@nas.nasa.gov" target="_blank"&gt;npb@nas.nasa.gov&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards, K. Kunita&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2022 07:53:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440401#M10172</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2022-12-21T07:53:55Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440405#M10173</link>
      <description>&lt;P&gt;Hi Santosh,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for your quick response.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's strange. I tried to show the console log here to answer to your question. But it can not be shown after I did "Post Reply" . Is there some length limitation?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anyway, I attached the text log file whch can show the asnwer to your question. Please look at.&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2022 08:01:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1440405#M10173</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2022-12-21T08:01:23Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1441584#M10189</link>
      <description>&lt;P&gt;Hi,&amp;nbsp; Santosh,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Oh, now I can see the replies what I made and could not see. Then there are 3 same (almost) replies are shown. Please ignore those and see the attached file for your question.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards, K. Kunita&lt;/P&gt;</description>
      <pubDate>Mon, 26 Dec 2022 00:11:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1441584#M10189</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2022-12-26T00:11:38Z</dc:date>
    </item>
    <item>
      <title>Re:3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1442202#M10201</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for providing all the requested details.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We are working on your issue &amp;amp; will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; regards,&lt;/P&gt;&lt;P&gt;Santosh&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 28 Dec 2022 09:22:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1442202#M10201</guid>
      <dc:creator>SantoshY_Intel</dc:creator>
      <dc:date>2022-12-28T09:22:38Z</dc:date>
    </item>
    <item>
      <title>Re: Re:3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1447294#M10263</link>
      <description>&lt;P&gt;Do you have any update? Could you tell me if you can reproduce the symtom?&amp;nbsp; If you want to get the additional information from me, please tell me.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regard, K. Kunita&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jan 2023 04:27:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1447294#M10263</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2023-01-17T04:27:48Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1447396#M10267</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sorry for the delay.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please let us know if you could run your application without MPI with a single process using the command below?&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;./bin/bt.B.x &lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks &amp;amp; Regards,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Santosh&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jan 2023 09:48:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1447396#M10267</guid>
      <dc:creator>SantoshY_Intel</dc:creator>
      <dc:date>2023-01-17T09:48:47Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1448150#M10280</link>
      <description>&lt;P&gt;Yes, I can run it without mpi.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards, K. Kunita&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jan 2023 02:30:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1448150#M10280</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2023-01-19T02:30:03Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1448614#M10283</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for the confirmation.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We couldn't reproduce your issue as we don't have access to the exact infrastructure.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We suggest you access the &lt;A href="https://devcloud.intel.com/oneapi/home/" target="_blank"&gt;Intel Devcloud&lt;/A&gt; &amp;amp; do experiments there. Please get back to us if you still face the issue.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;
&lt;P&gt;Santosh&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jan 2023 06:52:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1448614#M10283</guid>
      <dc:creator>SantoshY_Intel</dc:creator>
      <dc:date>2023-01-20T06:52:08Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1449015#M10287</link>
      <description>&lt;P&gt;Is Intel Devcloud vitual machine environment? If so, it is meaningless to try it.&amp;nbsp; As I showed, The symptom does not happen on vitual machine environment. Only happen with no-virtual machine environment. Could you tell me, how do you tried to reproduce the case (environment information, processor, OS, memory size, NIC (and driver), version of Intel MPI, NPB version, etc..), if you tried same things as me and you could not see the issue, it may be a solution for me or may give something to help to find the cause of the issue.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regard, K. Kunita&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 22 Jan 2023 09:49:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1449015#M10287</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2023-01-22T09:49:44Z</dc:date>
    </item>
    <item>
      <title>Re:3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1449165#M10288</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;"Is Intel Devcloud virtual machine environment?"&lt;/P&gt;&lt;P&gt;No, you can try experimenting on Intel Devcloud &amp;amp; get back to us if you face the same issue.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Santosh&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 23 Jan 2023 07:05:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1449165#M10288</guid>
      <dc:creator>SantoshY_Intel</dc:creator>
      <dc:date>2023-01-23T07:05:30Z</dc:date>
    </item>
    <item>
      <title>Re: Re:3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1449547#M10292</link>
      <description>&lt;P&gt;I tired to login to Intel devcloud. I found that the CPU is skylake. The problem is not occured with skylake based Xeon. I only showed with 3rd Gen. Xeon (Ice lake). Then I think that it is meaning-less to try dev clould with 2nd Gen. Xeon. Did you tried to reproduce my problem with 3rd Gen. Xeon scalable processor? I showed the issue only with Intel Xeon Silver 4310 Processor and Xeon Silver 4309Y Processor. I could not see the issue with Intel Xeon silver 4214R.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards, K. Kunita&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2023 08:11:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1449547#M10292</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2023-01-24T08:11:37Z</dc:date>
    </item>
    <item>
      <title>Re: Re:3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1455555#M10375</link>
      <description>&lt;P&gt;Santosh,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you have any comment to my reply?&amp;nbsp; Devcloud may not helpful to reproduce the issue because Devcloud is using 2nd Gen Xeon scalable processor and the issue is only happened with 3rd gen Xeon.&amp;nbsp; If you could not reproduce the issue with 3rd Gen Xeon, could you tell me about the detailed information of your environment. It may helpful to find the way to solve the our problem.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards, K. Kunita&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2023 06:01:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1455555#M10375</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2023-02-14T06:01:23Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1455661#M10377</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for your patience.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I tried on Intel Devcloud, where we could access both Intel Xeon Scalable processors &amp;amp; 3rd Gen Intel Xeon Scalable processors.&lt;/P&gt;
&lt;P&gt;Command to see the list of available nodes having 3rd Gen Intel Xeon scalable processors: &lt;BR /&gt;$&amp;nbsp;&lt;FONT color="#808080"&gt;pbsnodes | grep gold6348 -B 4&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Command to launch a node with 3rd Gen Intel Xeon scalable processor:&lt;/P&gt;
&lt;P&gt;$&lt;FONT color="#808080"&gt;qsub -I -l nodes=s002-n001:ppn=2 -d .&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can see that you are using mpif90 &amp;amp; mpicc compilers while building the application. I tried using Intel &lt;STRONG&gt;mpiifort &amp;amp; mpiicc&lt;/STRONG&gt; compilers and followed the steps mentioned by you.&lt;/P&gt;
&lt;P&gt;In the case of using mpiifort &amp;amp; mpiicc, I changed the config/make.def file as shown below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;MPIFC = /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/bin/mpiifort
FMPI_LIB = -L /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/lib -lmpifort 
FMPI_INC = -I /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/include
MPICC = /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/mpiicc
CMPI_LIB = -L /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/lib -lmpi 
CMPI_INC = -I /glob/development-tools/versions/oneapi/2023.0.1/oneapi/mpi/2021.8.0/include&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Given below are my observations:&lt;/P&gt;
&lt;TABLE width="742"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="201"&gt;&lt;STRONG&gt;Product Collection&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD colspan="2" width="264"&gt;Intel Xeon Scalable Processor&lt;/TD&gt;
&lt;TD colspan="2" width="277"&gt;3rd Gen Intel Xeon Scalable Processor&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;STRONG&gt;Model Name&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD colspan="2"&gt;Intel(R) Xeon(R) Gold 6128 CPU&lt;/TD&gt;
&lt;TD colspan="2"&gt;Intel(R) Xeon(R) Platinum 8358 CPU&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;STRONG&gt;Compilers Used&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;Using mpif90/mpicc&lt;/TD&gt;
&lt;TD&gt;Using mpiifort/mpiicc&lt;/TD&gt;
&lt;TD&gt;Using mpif90/mpicc&lt;/TD&gt;
&lt;TD&gt;Using mpiifort/mpiicc&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;STRONG&gt;Test 1 (mpirun -n 4 ./bin/bt.B.x)&amp;nbsp;&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;51.51 sec&lt;/TD&gt;
&lt;TD&gt;50.78 sec&lt;/TD&gt;
&lt;TD&gt;49.33 sec&lt;/TD&gt;
&lt;TD&gt;45.23 sec&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;STRONG&gt;Test 2 (mpirun -n 4 ./bin/bt.C.x)&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;214.58 sec&lt;/TD&gt;
&lt;TD&gt;214.11 sec&lt;/TD&gt;
&lt;TD&gt;289.47 sec&lt;/TD&gt;
&lt;TD&gt;233.82 sec&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;
&lt;P&gt;Santosh&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2023 04:42:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1455661#M10377</guid>
      <dc:creator>SantoshY_Intel</dc:creator>
      <dc:date>2023-02-15T04:42:24Z</dc:date>
    </item>
    <item>
      <title>Re: 3rd gen Xeon showed slower performance with intel MPI library</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1455992#M10381</link>
      <description>&lt;P&gt;Satosh,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for your reply.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I tried to compile/build&amp;nbsp; with mpiifort. However I can not build/compile the npb executable due to following errors.&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;CentOS 7.9 case:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; ifort: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found.&lt;/P&gt;
&lt;P&gt;AlmaLinux 8.6 case:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; many undefiled references&amp;nbsp; ( I show some of error lines and command which cause the errors) in the below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;/opt/intel/oneapi/mpi/latest/bin/mpiifort -O3 -o ../bin/bt.C.x bt.o bt_data.o make_set.o initialize.o exact_solution.o exact_rhs.o set_constants.o adi.o define.o copy_faces.o rhs.o solve_subs.o x_solve.o y_solve.o z_solve.o add.o error.o verify.o setup_mpi.o mpinpb.o ../common/get_active_nprocs.o ../common/print_results.o ../common/timers.o btio.o -L/opt/intel/oneapi/mpi/latest/lib -lmpifort&lt;BR /&gt;ld: ../common/get_active_nprocs.o: in function `get_active_nprocs_':&lt;BR /&gt;get_active_nprocs.f90:(.text+0x286): undefined reference to `_gfortran_get_environment_variable_i4'&lt;BR /&gt;ld: get_active_nprocs.f90:(.text+0x2b9): undefined reference to `_gfortran_compare_string'&lt;BR /&gt;ld: get_active_nprocs.f90:(.text+0x2e2): undefined reference to `_gfortran_compare_string'&lt;BR /&gt;ld: get_active_nprocs.f90:(.text+0x2ff): undefined reference to `_gfortran_compare_string'&lt;/P&gt;
&lt;P&gt;In this case I used follwoing setting in confg/make.def&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;MPIFC = /opt/intel/oneapi/mpi/latest/bin/mpiifort &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;FMPI_LIB = -L /opt/intel/oneapi/mpi/latest/lib -lmpifort &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;FMPI_INC = -I /opt/intel/oneapi/mpi/latest/include &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;MPICC = /opt/intel/oneapi/mpi/latest/mpicc &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;CMPI_LIB = -L /opt/intel/oneapi/mpi/latest/lib -lmpi &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;CMPI_INC = -I&amp;nbsp; /opt/intel/oneapi/oneapi/mpi/latest/include&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And also I tried with change "latest" to "2021.8.0" and same errors were shown.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you tell me how to erase the errors?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards, K. Kunita&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2023 02:34:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/3rd-gen-Xeon-showed-slower-performance-with-intel-MPI-library/m-p/1455992#M10381</guid>
      <dc:creator>Kuni</dc:creator>
      <dc:date>2023-02-15T02:34:16Z</dc:date>
    </item>
  </channel>
</rss>

