Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1624 Discussions

Multinode Python for XGBoost or other libraries in OneDAl

TKJ
Beginner
1,948 Views

I am training a model using the 2014 NYC-Taxi dataset for comparison against a previous experiment using an NVidia server. However, my runs exit with a killed command, which seems to be due to memory use. Because of this, I tried to find a way to distribute the training across multiple nodes with MODIN, but can't find any examples for this on the devcloud.

 

  1. Is there a way to confirm/optimize the memory use for my application, so that I don't have to go multinode?
  2. Are there any examples of multinode python on devcloud?
Labels (1)
0 Kudos
6 Replies
JananiC_Intel
Moderator
1,910 Views

Hi,

 

Thanks for posting in Intel forums.

 

Could you share the sample reproducer? Also kindly let us know the below.

1)Are you trying this training from login node ?

2)Are you using intel modin or any of the intel's optimized frameworks?

 

And for your information, currently we have few limitations while requesting multiple nodes in devcloud. As a part of these limitations, a user can only run two jobs/nodes  in devcloud.

 

Regarding the examples of multinode python, please find the below github samples.

https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Features-and-Functionality/IntelPython_daal4py_DistributedLinearRegression

https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Features-and-Functionality/IntelTensorFlow_Horovod_Multinode_Training

https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Features-and-Functionality/IntelPython_daal4py_DistributedKMeans

 

Regards,

Janani Chandran

 

0 Kudos
JananiC_Intel
Moderator
1,876 Views

Hi,


Is your issue resolved? Do you have any update?


Regards,

Janani Chandran



0 Kudos
TKJ
Beginner
1,848 Views

Apologies for the delay:

 

  1. I ran from an interactive node.
  2. Within the last few days, I've seen some information about MODIN, and I'm working to integrate it into my code now, either that or numba.

Are you saying a user can only run 2 jobs on one node? Or that one job can request/use 2 nodes at max?

 

Also, what is the, "reproducer?"

0 Kudos
JananiC_Intel
Moderator
1,817 Views

Hi,

 

Regarding your question on jobs/nodes limitations, currently we allow every user to run two jobs, one job running on two nodes simultaneously or two jobs each running on one node. 

 

Also, what is the, "reproducer?" - In order to reproduce your issue we need a sample script called reproducer containing the steps you tried. This will help us to identify your exact issue.

 

Regards,

Janani Chandran

 

0 Kudos
JananiC_Intel
Moderator
1,751 Views

Hi,


Is your issue resolved? If not, could you share the sample reproducer?


Regards,

Janani Chandran


0 Kudos
JananiC_Intel
Moderator
1,695 Views

Hi,


I assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Janani Chandran


0 Kudos
Reply