Hi,
Does Intel's distribution of XGBoost (in the oneAPI AI Analytics Toolkit) support multi-node training? If so, can this experiment be executed on Devcloud? Is there any reference documentation available?
Regards,
Manjari
链接已复制
Hi,
Thanks for posting in Intel forum.
We will check on this and get back to you soon.
For your information, currently we don't have any reference documentation for Intel XGBoost multi node training, we will let you know once we get an update regarding this.
Regards,
Janani Chandran
Hi,
For multi-node training with Intel XGBoost in DevCloud follow the steps mentioned in the below article.
https://medium.com/intel-analytics-software/distributed-xgboost-with-modin-on-ray-fc17edef7720
In DevCloud, multi-node computation is only available through the job queue.
Syntax for multi-node:
qsub -l nodes=<count>:ppn=2
You can combine a request for multiple nodes with a request for their specific features.
Try this and let us know the updates.
Regards,
Janani Chandran
Hi,
Thank you for your reply. I did try this method before. However, when I looked into OneAPI and DevCloud documentation, it mentioned that in order to distribute XGBoost training between multiple nodes I would have to use MPI communication for nodes to process in parallel.
I am unable to find documentation or any code sample that would explain it since I have no experience in MPI application programming.
Could you please let me know more about it? If this is the correct way or the way you mentioned is same as the above.
Thank you!
Regards,
Manjari Misra
Hi,
Please find the below documentation for running basic mpi application in DevCloud.
You can find the sample code under "Distributed-Memory Architecture" topic.
link: https://devcloud.intel.com/oneapi/documentation/advanced-queue/
Hope this helps.
Thanks