- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am training an XGBoost model on 2 nodes using MPI (mpi4py) for the distribution of workload.
As per the link provided to me below,
https://devcloud.intel.com/oneapi/documentation/advanced-queue/
I created a list of the 2 nodes (mother superior and sister node) in the hostfile.txt achieved from the machine file (path in environment variable $PBS_NODEFILE).
I, then used, the following command to run the code,
mpirun --hostfile hosts.txt python multi_node.py --N=1
*(N = parameter in the code)
(Also, when I used "mpirun -n 2 python script.py", script.py being a minimal mpi4py code, it works fine. Should I be using some other way to run my code?)
>>Also, I have created a virtual environment which uses Intel Modin toolkit libraries in oneAPI. I wanted to know as to how I will make sure I can activate the same environment in the other node that the code will run on.
I am facing an error issue which is attached below and I am not able to understand or resolve. Please let me know the issue and how I can resolve it. Thank you!
Regards,
Manjari
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
Before running on multiple nodes, we need to mention how many nodes we want.
qsub -I -l nodes=<number_of_nodes>:<property>:ppn=2 -d .
example:(qsub -I -l nodes=2:gpu:ppn=2 -d .)
After logging into the compute node, we need to get the node numbers which we accessed.
echo $PBS_NODEFILE (example output looks like this: /var/spool/torque/aux//1965007.v-qsvr-1.aidevcloud)
We need to cat the output of $PBS_NODEFILE
example output : cat /var/spool/torque/aux//1965007.v-qsvr-1.aidevcloud
s001-n141
s001-n141
s001-n157
s001-n157
Copy the node numbers from above and paste them into the host file (I pasted the above node numbers into host1)
After pasting the node numbers into the host file, we can run the mpirun command.
mpirun -n 4 -hostfile host1 python hello.py
If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Have a great day a head.
Regards,
Jaideep
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Have a great day a head.
Regards,
Jaideep
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Thanks,
Jaideep

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page