<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1239765#M7493</link>
    <description>&lt;P&gt;Hello Prasanth,&lt;/P&gt;
&lt;P&gt;Now I can load the MPI release_mt mode and run the command you explained.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;$ cat run_async.sh

#!/usr/bin/bash
. /opt/intel/inteloneapi/mpi/2021.1.1/env/vars.sh --i_mpi_library_kind=release_mt
echo $LD_LIBRARY_PATH
#echo "+ The nodefile for this job is stored at ${PBS_NODEFILE}"
uniq ${PBS_NODEFILE} node_list.txt
mapfile -t nodes &amp;lt; node_list.txt
np=$(wc -l &amp;lt; ${PBS_NODEFILE})
echo "+ Number of cores assigned: ${np}"
echo "+ node list:" ${nodes[0]} ${nodes[1]}
I_MPI_ASYNC_PROGRESS=1 I_MPI_DEBUG=10 mpiexec.hydra -n 2 -host ${nodes[0]} -env I_MPI_ASYNC_PROGRESS_PIN=5,6 ./a.out ./mtx/hcircuit.mtx : -n 2 -host ${nodes[1]} -env I_MPI_ASYNC_PROGRESS_PIN=1,2 ./a.out ./mtx/hcircuit.mtx&lt;/LI-CODE&gt;
&lt;P&gt;&lt;BR /&gt;The result of this command is as follows.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.1  Build 20201112 (id: b9c9d2fc5)
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release_mt
[0] MPI startup(): libfabric version: 1.11.0-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[1] MPI startup(): global_rank 1, local_rank 1, local_size 2, threads_per_node 2
[3] MPI startup(): global_rank 3, local_rank 1, local_size 2, threads_per_node 2
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       18729    s001-n056  {0,1,2,3,4,5,12,13,14,15,16,17}
[0] MPI startup(): 1       18730    s001-n056  {6,7,8,9,10,11,18,19,20,21,22,23}
[0] MPI startup(): 2       21267    s001-n023  {0,1,2,3,4,5,12,13,14,15,16,17}
[0] MPI startup(): 3       21268    s001-n023  {6,7,8,9,10,11,18,19,20,21,22,23}
[0] MPI startup(): I_MPI_ROOT=/glob/development-tools/versions/oneapi/gold/inteloneapi/mpi/2021.1.1
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_ASYNC_PROGRESS=1
[0] MPI startup(): I_MPI_ASYNC_PROGRESS_PIN=5,6
[0] MPI startup(): I_MPI_DEBUG=10
[0] MPI startup(): threading: mode: handoff
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: is_threaded: 1
[0] MPI startup(): threading: async_progress: 1
[0] MPI startup(): threading: num_pools: 64
[0] MPI startup(): threading: lock_level: nolock
[0] MPI startup(): threading: enable_sep: 0
[0] MPI startup(): threading: direct_recv: 0
[0] MPI startup(): threading: zero_op_flags: 0
[0] MPI startup(): threading: num_am_buffers: 8
[0] MPI startup(): threading: library is built with per-vci thread granularity
[0] MPI startup(): global_rank 0, local_rank 0, local_size 2, threads_per_node 2
[0] MPI startup(): threading: thread: 0, processor: 5
[0] MPI startup(): threading: thread: 1, processor: 6
[2] MPI startup(): global_rank 2, local_rank 0, local_size 2, threads_per_node 2
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Two additional threads (&lt;STRONG&gt;5 and 6&lt;/STRONG&gt;) specified in the first part of the command ran, but two additional threads (&lt;STRONG&gt;1 and 2&lt;/STRONG&gt;) specified in the last part of the command did not.&lt;/P&gt;
&lt;P&gt;Please let me know what was wrong with the command.&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 22 Dec 2020 11:39:41 GMT</pubDate>
    <dc:creator>Viet</dc:creator>
    <dc:date>2020-12-22T11:39:41Z</dc:date>
    <item>
      <title>On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1233853#M7392</link>
      <description>&lt;P&gt;Dear Devcloud administrator and supporter,&lt;/P&gt;
&lt;P&gt;I would like to test the command&amp;nbsp;&lt;EM&gt;&lt;SPAN&gt;MPI_Iallreduce&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(non-blocking communication) as described by the below page&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://techdecoded.intel.io/resources/hiding-communication-latency-using-mpi-3-non-blocking-collectives/" target="_blank"&gt;https://techdecoded.intel.io/resources/hiding-communication-latency-using-mpi-3-non-blocking-collectives/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I am going to run the code on a university supercomputer which is based on Intel CPUs.&lt;/P&gt;
&lt;P&gt;The code will run through a job scheduling system where I can't specify the CPU list because the job scheduling system sets it automatically.&lt;/P&gt;
&lt;P&gt;The following setting is explained in the above link.&lt;/P&gt;
&lt;PRE&gt;export I_MPI_ASYNC_PROGRESS_PIN=&amp;lt;CPU list&amp;gt;&lt;/PRE&gt;
&lt;P&gt;Is the above setting necessary?&lt;/P&gt;
&lt;P&gt;Is there a problem if this setting is not used?&lt;/P&gt;
&lt;P&gt;Thank you very much for any help you can provide.&lt;/P&gt;
&lt;P&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Dec 2020 10:49:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1233853#M7392</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-03T10:49:01Z</dc:date>
    </item>
    <item>
      <title>Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1233859#M7393</link>
      <description>&lt;P&gt;Hi &lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;Viet,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;As this question is not about using devcloud but about running an MPI application we are moving this to HPC toolkit forum.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;Arun&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 03 Dec 2020 11:47:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1233859#M7393</guid>
      <dc:creator>ArunJ_Intel</dc:creator>
      <dc:date>2020-12-03T11:47:58Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1234128#M7394</link>
      <description>&lt;P&gt;Hi Arun,&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;
&lt;P&gt;Viet&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Dec 2020 04:54:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1234128#M7394</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-04T04:54:49Z</dc:date>
    </item>
    <item>
      <title>Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1234192#M7395</link>
      <description>&lt;P&gt;Hi Viet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;As mentioned in the article "&lt;I&gt;there is an overhead associated with non-blocking communication from making it asynchronous. Although asynchronous progress improves communication-computation overlap, it requires an additional thread per MPI rank. This thread consumes CPU cycles and, ideally, must be pinned to an exclusive core."&lt;/I&gt;&lt;/P&gt;&lt;P&gt;Each additional process will use an additional CPU core which you can pin to a certain core using I_MPI_ASYNC_PROGRESS_PIN=&amp;lt;CPU list&amp;gt;, just like how you pin mpi processes to certain cores using I_MPI_PIN_PROCESSOR_LIST.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;For example, if i do not use I_MPI_ASYNC_PROGRESS_PIN variable it will still use cores but MPI will select those cores accordingly.&lt;/P&gt;&lt;P&gt;eg: I_MPI_ASYNC_PROGRESS=1 I_MPI_DEBUG=10 mpirun -n 4 -host epb602 ./org&amp;nbsp;&lt;/P&gt;&lt;P&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 9&amp;nbsp;Build 20200923 (id: abd58e492)&lt;/P&gt;&lt;P&gt;[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.&amp;nbsp;All rights reserved.&lt;/P&gt;&lt;P&gt;[0] MPI startup(): library kind: release_mt&lt;/P&gt;&lt;P&gt;[0] MPI startup(): libfabric version: 1.10.1-impi&lt;/P&gt;&lt;P&gt;......&lt;/P&gt;&lt;P&gt;[0] MPI startup(): I_MPI_ASYNC_PROGRESS=1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): I_MPI_DEBUG=10&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: mode: handoff&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: vcis: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: progress_threads: 0&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: is_threaded: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: async_progress: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: num_pools: 64&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: lock_level: nolock&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: enable_sep: 0&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: direct_recv: 0&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: zero_op_flags: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: num_am_buffers: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: library is built with per-vci thread granularity&lt;/P&gt;&lt;P&gt;[0] MPI startup(): global_rank 0, local_rank 0, local_size 4, threads_per_node 4&lt;/P&gt;&lt;P&gt;&lt;B&gt;[0] MPI startup(): threading: thread: 0, processor: 95&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;[0] MPI startup(): threading: thread: 1, processor: 94&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;[0] MPI startup(): threading: thread: 2, processor: 93&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;[0] MPI startup(): threading: thread: 3, processor: 92&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;My node has 96 cores and it selected the last 4 cores(92-95) for async threads as i have launched 4 processes.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I can select the cores to use for async threads with I_MPI_ASYNC_PROGRESS_PIN=&amp;lt;CPU list&amp;gt;&lt;/P&gt;&lt;P&gt;Eg: I_MPI_ASYNC_PROGRESS_PIN=81,82,83,84&amp;nbsp;I_MPI_ASYNC_PROGRESS=1 I_MPI_DEBUG=10 mpirun -n 4 -host epb602 ./org&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 9&amp;nbsp;Build 20200923 (id: abd58e492)&lt;/P&gt;&lt;P&gt;[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.&amp;nbsp;All rights reserved.&lt;/P&gt;&lt;P&gt;[0] MPI startup(): library kind: release_mt&lt;/P&gt;&lt;P&gt;[0] MPI startup(): libfabric version: 1.10.1-impi&lt;/P&gt;&lt;P&gt;....&lt;/P&gt;&lt;P&gt;[0] MPI startup(): I_MPI_ASYNC_PROGRESS=1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): I_MPI_ASYNC_PROGRESS_PIN=81,82,83,84&lt;/P&gt;&lt;P&gt;[0] MPI startup(): I_MPI_DEBUG=10&lt;/P&gt;&lt;P&gt;[3] MPI startup(): global_rank 3, local_rank 3, local_size 4, threads_per_node 4&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: mode: handoff&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: vcis: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: progress_threads: 0&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: is_threaded: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: async_progress: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: num_pools: 64&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: lock_level: nolock&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: enable_sep: 0&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: direct_recv: 0&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: zero_op_flags: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: num_am_buffers: 1&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: library is built with per-vci thread granularity&lt;/P&gt;&lt;P&gt;[0] MPI startup(): global_rank 0, local_rank 0, local_size 4, threads_per_node 4&lt;/P&gt;&lt;P&gt;&lt;B&gt;[0] MPI startup(): threading: thread: 0, processor: 81&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;[0] MPI startup(): threading: thread: 1, processor: 82&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;[0] MPI startup(): threading: thread: 2, processor: 83&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;[0] MPI startup(): threading: thread: 3, processor: 84&lt;/B&gt;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;here you can see the cores 81-84 were used.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Hope this helps, let us know if you need any further assistance.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 04 Dec 2020 10:19:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1234192#M7395</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-12-04T10:19:51Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1234207#M7396</link>
      <description>&lt;P&gt;Dear Prasanth,&lt;/P&gt;
&lt;P&gt;Thank you very much for your useful information.&lt;BR /&gt;I now understand the important meaning of the I_MPI_ASYNC_PROGRESS_PIN variable.&lt;/P&gt;
&lt;P&gt;As mentioned by the following explanation in the article, I now want to specify one or two additional threads per node.&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080"&gt;"Exclusive thread pinning for each rank results in half of the cores being assigned just to accelerate the progress of non-blocking MPI calls. Therefore, through careful experimentation, we must select a certain number of cores per node to be assigned for asynchronous progress without causing a considerable compute penalty."&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;In your example, you ran your program on only one node.&lt;/P&gt;
&lt;P&gt;If I want to run on multiple nodes and specify one or two additional threads per node, what is the correct syntax to define the I_MPI_ASYNC_PROGRESS_PIN variable?&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;
&lt;P&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Dec 2020 12:03:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1234207#M7396</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-04T12:03:12Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1235169#M7409</link>
      <description>&lt;P&gt;Dear Prasanth,&lt;/P&gt;
&lt;P&gt;Do you know how to set the MPI release_mt mode in the oneAPI HPC for using MPI Non-blocking communication?&lt;/P&gt;
&lt;P&gt;The method explained in the following link did not match the MPI in the oneAPI HPC.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://scc.ustc.edu.cn/zlsc/tc4600/intel/2016.0.109/mpi/User_Guide/Intel_MPI_Library_Configurations.htm" target="_blank"&gt;https://scc.ustc.edu.cn/zlsc/tc4600/intel/2016.0.109/mpi/User_Guide/Intel_MPI_Library_Configurations.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Thank you for any help you can provide.&lt;BR /&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;IFRAME class="ginger-extension-definitionpopup" style="display: none;"&gt;&lt;/IFRAME&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Dec 2020 11:52:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1235169#M7409</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-08T11:52:11Z</dc:date>
    </item>
    <item>
      <title>Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1235626#M7417</link>
      <description>&lt;P&gt;Hi Viet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Sorry for the delay in response,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Q) If I want to run on multiple nodes and specify one or two additional threads per node, what is the correct syntax to define the I_MPI_ASYNC_PROGRESS_PIN variable?&lt;/P&gt;&lt;P&gt;A) It is the same as for a single node, but the process will be divided across nodes and so does the async threads.&lt;/P&gt;&lt;P&gt;For e.g., if you launch 10 across 2 nodes you have to provide only 5 cores to I_MPI_ASYNC_PROGRESS_PIN as only 5 processes run on a single node.&lt;/P&gt;&lt;P&gt;Q) Do you know how to set the MPI release_mt mode in the oneAPI HPC for using MPI Non-blocking communication?&lt;/P&gt;&lt;P&gt;It's the same as you have mentioned you have to provide library configuration (release_mt) as an argument to mpivars.sh script.&lt;/P&gt;&lt;P&gt;For more info please refer:&amp;nbsp;&lt;A href="https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/running-applications/selecting-a-library-configuration.html" rel="noopener noreferrer" target="_blank"&gt;Selecting a Library Configuration (intel.com)&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Let us know if you face any issues.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 09 Dec 2020 13:28:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1235626#M7417</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-12-09T13:28:35Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1235852#M7419</link>
      <description>&lt;P&gt;Hi Prasanth,&lt;/P&gt;
&lt;P&gt;I was able to set the MPI release_mt mode, as explained under the link you sent.&lt;/P&gt;
&lt;P&gt;Thank you very much.&lt;/P&gt;
&lt;P&gt;I still don't know how to set the I_MPI_ASYNC_PROGRESS_PIN variable.&lt;/P&gt;
&lt;P&gt;Suppose I want to run a non-blocking program on two nodes with node names: node_id1, node_id2. Suppose I want to tie additional threads to cores 1, 2 on node_id1 and cores 3,4 on node_id2.&lt;/P&gt;
&lt;P&gt;I think it should be something like this:&lt;/P&gt;
&lt;P&gt;export I_MPI_ASYNC_PROGRESS_PIN = node_id1:1, node_id1:2, node_id2:3, node_id2:4&lt;/P&gt;
&lt;P&gt;Please let me know the correct setting for this variable.&lt;/P&gt;
&lt;P&gt;Thank you!&lt;BR /&gt;Viet&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2020 03:31:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1235852#M7419</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-10T03:31:07Z</dc:date>
    </item>
    <item>
      <title>Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1237862#M7437</link>
      <description>&lt;P&gt;Hi Viet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;To answer your question.&lt;/P&gt;&lt;P&gt;Q) Suppose I want to run a non-blocking program on two nodes with node names: node_id1, node_id2. Suppose I want to tie additional threads to cores 1, 2 on node_id1 and cores 3,4 on node_id2.&lt;/P&gt;&lt;P&gt;A) You can use argument sets for this. The command is&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;&amp;nbsp;mpiexec.hydra -n 2 -host node_id1 -env I_MPI_ASYNC_PROGRESS_PIN=1,2 ./&amp;lt;exec&amp;gt; : -n 2 -host node_id2&amp;nbsp;-env I_MPI_ASYNC_PROGRESS_PIN=3,4./&amp;lt;exec&amp;gt;&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Hope this helps. Let me know if you have any other queries.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Dec 2020 11:00:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1237862#M7437</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-12-16T11:00:07Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238268#M7452</link>
      <description>&lt;P&gt;Hi Prasanth,&lt;/P&gt;
&lt;P&gt;Thank you very much.&lt;BR /&gt;I'll confirm this command on Devcloud and inform you know the result soon.&lt;/P&gt;
&lt;P&gt;mpiexec.hydra &lt;STRONG&gt;-n 2&lt;/STRONG&gt; -host node_id1 -env I_MPI_ASYNC_PROGRESS_PIN=1,2 ./&amp;lt;exec&amp;gt; : &lt;STRONG&gt;-n 2&lt;/STRONG&gt; -host node_id2&amp;nbsp;-env I_MPI_ASYNC_PROGRESS_PIN=3,4./&amp;lt;exec&amp;gt;&lt;/P&gt;
&lt;P&gt;Is it true that this command uses &lt;STRONG&gt;eight threads&lt;/STRONG&gt; where &lt;STRONG&gt;four pinned threads&lt;/STRONG&gt; are used for non-blocking communication?&lt;/P&gt;
&lt;P&gt;Sincerely,&lt;/P&gt;
&lt;P&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Dec 2020 10:14:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238268#M7452</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-17T10:14:05Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238660#M7458</link>
      <description>&lt;P&gt;Hi Prasanth,&lt;/P&gt;
&lt;P&gt;Do you know how to run your MPI command on the Intel Devcloud system?&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;mpiexec.hydra&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;-n 2&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;-host node_id1 -env I_MPI_ASYNC_PROGRESS_PIN=1,2 ./&amp;lt;exec&amp;gt; :&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;-n 2&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;-host node_id2&amp;nbsp;-env I_MPI_ASYNC_PROGRESS_PIN=3,4./&amp;lt;exec&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The PBS queuing system in the Devcloud automatically assigns the nodes to the MPI job when the job is run.&lt;BR /&gt;I don't know how to get specified node names when running a qsub command.&lt;/P&gt;
&lt;P&gt;Sincerely,&lt;/P&gt;
&lt;P&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2020 09:12:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238660#M7458</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-18T09:12:12Z</dc:date>
    </item>
    <item>
      <title>Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238685#M7459</link>
      <description>&lt;P&gt;Hi Viet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The command launches 4 processes/ranks and 4 async threads which need to be run on separate cores.&lt;/P&gt;&lt;P&gt;Let us know if the command works for you.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 18 Dec 2020 09:57:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238685#M7459</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-12-18T09:57:57Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238703#M7460</link>
      <description>&lt;P&gt;Hello Prasanth,&lt;BR /&gt;Thank you for your response.&lt;BR /&gt;I will confirm this with the VTune Profiler.&lt;BR /&gt;I am still having problems executing your command on the Devcoud through the PBS system.&lt;BR /&gt;Best regards,&lt;BR /&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2020 11:48:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238703#M7460</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-18T11:48:40Z</dc:date>
    </item>
    <item>
      <title>Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238708#M7462</link>
      <description>&lt;P&gt;Hi Viet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please let us know the errors you were facing while running in Devcloud?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 18 Dec 2020 12:04:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1238708#M7462</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-12-18T12:04:28Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1239278#M7470</link>
      <description>&lt;P&gt;Hi Prasanth,&lt;/P&gt;
&lt;P&gt;Thank you for your response.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;$ qsub -l nodes=2:ppn=2 -d . run_async.sh 
&lt;/LI-CODE&gt;
&lt;P&gt;Below is the error message when I ran the above command&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;$ cat run_async.sh.e767999 
[mpiexec@s001-n020] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on s001-n144 (pid 7432, exit code 65280)
[mpiexec@s001-n020] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec@s001-n020] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec@s001-n020] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:772): error waiting for event
[mpiexec@s001-n020] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1955): error setting up the boostrap proxies&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The file run_async.sh contains an MPI command that follows your specified syntax.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;$ cat run_async.sh
#!/usr/bin/bash
mpiexec.hydra -n 2 -host s001-n144 -env I_MPI_ASYNC_PROGRESS_PIN=1,2 ./a.out ./mtx/hcircuit.mtx : -n 2 -host s001-n143 -env I_MPI_ASYNC_PROGRESS_PIN=3,4 ./a.out ./mtx/hcircuit.mtx&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for anything you can provide.&lt;/P&gt;
&lt;P&gt;Sincerely,&lt;/P&gt;
&lt;P&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Dec 2020 07:26:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1239278#M7470</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-21T07:26:37Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1239719#M7490</link>
      <description>&lt;P&gt;Hi Prasanth,&lt;/P&gt;
&lt;P&gt;I added the setting to go into the release_mt mode, but still no success.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;$ cat run_async.sh
#!/usr/bin/bash
source /opt/intel/inteloneapi/setvars.sh release_mt --force 
echo $LD_LIBRARY_PATH
mpiexec.hydra -n 2 -host s001-n144 -env I_MPI_ASYNC_PROGRESS_PIN=1,2 ./a.out ./mtx/hcircuit.mtx : -n 2 -host s001-n143 -env I_MPI_ASYNC_PROGRESS_PIN=3,4 ./a.out ./mtx/hcircuit.mtx
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;$ cat run_async.sh.e769054 
[mpiexec@s001-n008] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on s001-n144 (pid 22676, exit code 65280)
[mpiexec@s001-n008] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[mpiexec@s001-n008] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[mpiexec@s001-n008] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:772): error waiting for event
[mpiexec@s001-n008] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1955): error setting up the boostrap proxies
&lt;/LI-CODE&gt;
&lt;P&gt;Thank you for anything you can provide.&lt;/P&gt;
&lt;P&gt;Best, Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Dec 2020 08:03:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1239719#M7490</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-22T08:03:19Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1239726#M7491</link>
      <description>&lt;P&gt;Hi Prasanth,&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;. /opt/intel/inteloneapi/mpi/2021.1.1/env/vars.sh release_mt&lt;/LI-CODE&gt;
&lt;P&gt;I also ran the above command, but it seems that the release_mt mode is not loaded in the Intel MPI Library version 2021.&lt;/P&gt;
&lt;P&gt;Best, Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Dec 2020 08:35:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1239726#M7491</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-22T08:35:47Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1239765#M7493</link>
      <description>&lt;P&gt;Hello Prasanth,&lt;/P&gt;
&lt;P&gt;Now I can load the MPI release_mt mode and run the command you explained.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;$ cat run_async.sh

#!/usr/bin/bash
. /opt/intel/inteloneapi/mpi/2021.1.1/env/vars.sh --i_mpi_library_kind=release_mt
echo $LD_LIBRARY_PATH
#echo "+ The nodefile for this job is stored at ${PBS_NODEFILE}"
uniq ${PBS_NODEFILE} node_list.txt
mapfile -t nodes &amp;lt; node_list.txt
np=$(wc -l &amp;lt; ${PBS_NODEFILE})
echo "+ Number of cores assigned: ${np}"
echo "+ node list:" ${nodes[0]} ${nodes[1]}
I_MPI_ASYNC_PROGRESS=1 I_MPI_DEBUG=10 mpiexec.hydra -n 2 -host ${nodes[0]} -env I_MPI_ASYNC_PROGRESS_PIN=5,6 ./a.out ./mtx/hcircuit.mtx : -n 2 -host ${nodes[1]} -env I_MPI_ASYNC_PROGRESS_PIN=1,2 ./a.out ./mtx/hcircuit.mtx&lt;/LI-CODE&gt;
&lt;P&gt;&lt;BR /&gt;The result of this command is as follows.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.1  Build 20201112 (id: b9c9d2fc5)
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release_mt
[0] MPI startup(): libfabric version: 1.11.0-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[1] MPI startup(): global_rank 1, local_rank 1, local_size 2, threads_per_node 2
[3] MPI startup(): global_rank 3, local_rank 1, local_size 2, threads_per_node 2
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       18729    s001-n056  {0,1,2,3,4,5,12,13,14,15,16,17}
[0] MPI startup(): 1       18730    s001-n056  {6,7,8,9,10,11,18,19,20,21,22,23}
[0] MPI startup(): 2       21267    s001-n023  {0,1,2,3,4,5,12,13,14,15,16,17}
[0] MPI startup(): 3       21268    s001-n023  {6,7,8,9,10,11,18,19,20,21,22,23}
[0] MPI startup(): I_MPI_ROOT=/glob/development-tools/versions/oneapi/gold/inteloneapi/mpi/2021.1.1
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_ASYNC_PROGRESS=1
[0] MPI startup(): I_MPI_ASYNC_PROGRESS_PIN=5,6
[0] MPI startup(): I_MPI_DEBUG=10
[0] MPI startup(): threading: mode: handoff
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: is_threaded: 1
[0] MPI startup(): threading: async_progress: 1
[0] MPI startup(): threading: num_pools: 64
[0] MPI startup(): threading: lock_level: nolock
[0] MPI startup(): threading: enable_sep: 0
[0] MPI startup(): threading: direct_recv: 0
[0] MPI startup(): threading: zero_op_flags: 0
[0] MPI startup(): threading: num_am_buffers: 8
[0] MPI startup(): threading: library is built with per-vci thread granularity
[0] MPI startup(): global_rank 0, local_rank 0, local_size 2, threads_per_node 2
[0] MPI startup(): threading: thread: 0, processor: 5
[0] MPI startup(): threading: thread: 1, processor: 6
[2] MPI startup(): global_rank 2, local_rank 0, local_size 2, threads_per_node 2
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Two additional threads (&lt;STRONG&gt;5 and 6&lt;/STRONG&gt;) specified in the first part of the command ran, but two additional threads (&lt;STRONG&gt;1 and 2&lt;/STRONG&gt;) specified in the last part of the command did not.&lt;/P&gt;
&lt;P&gt;Please let me know what was wrong with the command.&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Dec 2020 11:39:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1239765#M7493</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-22T11:39:41Z</dc:date>
    </item>
    <item>
      <title>Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1240177#M7507</link>
      <description>&lt;P&gt;Hi Viet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Yes, I too have observed in Devcloud the normal way of setting library configuration wasn't working. I will forward this issue to the internal team thanks for reporting.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;coming to your other question:&lt;/P&gt;&lt;P&gt;A) Why does the pinning isn't showing for the other node?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I_MPI_ASYNC_PROGRESS=1 I_MPI_DEBUG=10 mpiexec.hydra -n 2 -host ${nodes[0]} -env I_MPI_ASYNC_PROGRESS_PIN=5,6 ./a.out ./mtx/hcircuit.mtx : -n 2 -host ${nodes[1]} -env I_MPI_ASYNC_PROGRESS_PIN=1,2 ./a.out ./mtx/hcircuit.mtx&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: thread: 0, processor: &lt;B&gt;5&lt;/B&gt;&lt;/P&gt;&lt;P&gt;[0] MPI startup(): threading: thread: 1, processor: &lt;B&gt;6&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;A) If you observe the debug info there is a square bracket [0] at the start of each line which means that debug info is coming from the 0th rank.&lt;/P&gt;&lt;P&gt;Generally, the 0th rank isn't aware of the pinning that happens on another node and that is the reason the 1,2 cores you have pinned in node1 aren't showing.&lt;/P&gt;&lt;P&gt;It doesn't mean the pinning is not occurring.&lt;/P&gt;&lt;P&gt;If you want to check, change the order of the MPMD command you used.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for reporting the issue.&lt;/P&gt;&lt;P&gt;Let us know if you have any other issues.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 23 Dec 2020 12:04:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1240177#M7507</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-12-23T12:04:36Z</dc:date>
    </item>
    <item>
      <title>Re: Re:On an MPI environment setting for using MPI-3* Non-Blocking Collectives</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1240760#M7518</link>
      <description>&lt;P&gt;Hello Prasanth,&lt;BR /&gt;It's good to know that debug information comes from only one rank.&lt;BR /&gt;I now know how to set up the MPI environment to use MPI-3 * Non-Blocking Collectives functions.&lt;BR /&gt;Explicitly pinning additional threads requires complicated affinity settings.&lt;BR /&gt;I'm now thinking about using offloaded MPI Non-Blocking Collectives functions on InfiniBand. I will open questions about this in a new thread.&lt;BR /&gt;This thread can be closed here. &lt;BR /&gt;Thank you very much for your valuable answers.&lt;BR /&gt;Hope you have a great New Year's holiday!&lt;BR /&gt;With best regards,&lt;BR /&gt;Viet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Dec 2020 09:26:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/On-an-MPI-environment-setting-for-using-MPI-3-Non-Blocking/m-p/1240760#M7518</guid>
      <dc:creator>Viet</dc:creator>
      <dc:date>2020-12-25T09:26:00Z</dc:date>
    </item>
  </channel>
</rss>

