<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:Regarding MPI for XGBoost multi node training in Intel® Optimized AI Frameworks</title>
    <link>https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/Regarding-MPI-for-XGBoost-multi-node-training/m-p/1406677#M385</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Have a great day a head.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Jaideep&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Mon, 08 Aug 2022 06:29:39 GMT</pubDate>
    <dc:creator>JaideepK_Intel</dc:creator>
    <dc:date>2022-08-08T06:29:39Z</dc:date>
    <item>
      <title>Regarding MPI for XGBoost multi node training</title>
      <link>https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/Regarding-MPI-for-XGBoost-multi-node-training/m-p/1404100#M383</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am training an XGBoost model on 2 nodes using MPI (mpi4py) for the distribution of workload.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As per the link provided to me below,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://devcloud.intel.com/oneapi/documentation/advanced-queue/" target="_blank" rel="noopener"&gt;https://devcloud.intel.com/oneapi/documentation/advanced-queue/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I created a list of the 2 nodes (mother superior and sister node) in the hostfile.txt achieved from the machine file (path in environment variable $PBS_NODEFILE).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I, then used, the following command to run the code,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;mpirun --hostfile hosts.txt python multi_node.py --N=1&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;*(N = parameter in the code)&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2022-07-28 at 2.55.41 AM.png" style="width: 999px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/32073i29E0E0EC1DCCA1A0/image-size/large/is-moderation-mode/true?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="Screenshot 2022-07-28 at 2.55.41 AM.png" alt="Screenshot 2022-07-28 at 2.55.41 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;(Also, when I used "&lt;STRONG&gt;mpirun -n 2 python script.py&lt;/STRONG&gt;", script.py being a minimal mpi4py code, it works fine. Should I be using some other way to run my code?)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt;Also, I have created a virtual environment which uses Intel Modin toolkit libraries in oneAPI. I wanted to know as to how I will make sure I can activate the same environment in the other node that the code will run on.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am facing an error issue which is attached below and I am not able to understand or resolve. Please let me know the issue and how I can resolve it. Thank you!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Manjari&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jul 2022 07:01:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/Regarding-MPI-for-XGBoost-multi-node-training/m-p/1404100#M383</guid>
      <dc:creator>Misra</dc:creator>
      <dc:date>2022-07-28T07:01:33Z</dc:date>
    </item>
    <item>
      <title>Re: Regarding MPI for XGBoost multi node training</title>
      <link>https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/Regarding-MPI-for-XGBoost-multi-node-training/m-p/1405296#M384</link>
      <description>&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Thank you for posting in Intel Communities.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Before running on multiple nodes, we need to mention how many nodes we want.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;qsub -I -l nodes=&amp;lt;number_of_nodes&amp;gt;:&amp;lt;property&amp;gt;:ppn=2 -d .&lt;/LI-CODE&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&lt;STRONG&gt;example:&lt;/STRONG&gt;(qsub -I -l nodes=2:gpu:ppn=2 -d .)&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;After logging into the compute node, we need to get the node numbers which we accessed.&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;echo $PBS_NODEFILE (example output looks like this: /var/spool/torque/aux//1965007.v-qsvr-1.aidevcloud)&lt;/LI-CODE&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&lt;SPAN&gt;We need to cat the output of $PBS_NODEFILE&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;example output : cat /var/spool/torque/aux//1965007.v-qsvr-1.aidevcloud
s001-n141
s001-n141
s001-n157
s001-n157&lt;/LI-CODE&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&lt;SPAN&gt;Copy the node numbers from above and paste them into the host file (I&amp;nbsp;pasted the&amp;nbsp;above node numbers&amp;nbsp;into&amp;nbsp;host1)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&lt;SPAN&gt;After pasting the node numbers into the host file, we can run the mpirun command.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;mpirun -n 4 -hostfile host1 python hello.py&lt;/LI-CODE&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JaideepK_Intel_0-1659418566505.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/32253i4915902DDE152E17/image-size/medium?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="JaideepK_Intel_0-1659418566505.png" alt="JaideepK_Intel_0-1659418566505.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Have a great day a head.&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&lt;SPAN&gt;Regards,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify lia-align-left lia-align-center"&gt;&lt;SPAN&gt;Jaideep&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-left lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Aug 2022 05:49:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/Regarding-MPI-for-XGBoost-multi-node-training/m-p/1405296#M384</guid>
      <dc:creator>JaideepK_Intel</dc:creator>
      <dc:date>2022-08-02T05:49:13Z</dc:date>
    </item>
    <item>
      <title>Re:Regarding MPI for XGBoost multi node training</title>
      <link>https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/Regarding-MPI-for-XGBoost-multi-node-training/m-p/1406677#M385</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Have a great day a head.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Jaideep&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 08 Aug 2022 06:29:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/Regarding-MPI-for-XGBoost-multi-node-training/m-p/1406677#M385</guid>
      <dc:creator>JaideepK_Intel</dc:creator>
      <dc:date>2022-08-08T06:29:39Z</dc:date>
    </item>
    <item>
      <title>Re:Regarding MPI for XGBoost multi node training</title>
      <link>https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/Regarding-MPI-for-XGBoost-multi-node-training/m-p/1408433#M388</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Jaideep&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 16 Aug 2022 04:45:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Optimized-AI-Frameworks/Regarding-MPI-for-XGBoost-multi-node-training/m-p/1408433#M388</guid>
      <dc:creator>JaideepK_Intel</dc:creator>
      <dc:date>2022-08-16T04:45:31Z</dc:date>
    </item>
  </channel>
</rss>

