Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1055 Discussions

Multinode Python for XGBoost or other libraries in OneDAl

TKJ
Beginner
896 Views

I am training a model using the 2014 NYC-Taxi dataset for comparison against a previous experiment using an NVidia server. However, my runs exit with a killed command, which seems to be due to memory use. Because of this, I tried to find a way to distribute the training across multiple nodes with MODIN, but can't find any examples for this on the devcloud.

 

  1. Is there a way to confirm/optimize the memory use for my application, so that I don't have to go multinode?
  2. Are there any examples of multinode python on devcloud?
Labels (1)
0 Kudos
6 Replies
JananiC_Intel
Moderator
858 Views

Hi,

 

Thanks for posting in Intel forums.

 

Could you share the sample reproducer? Also kindly let us know the below.

1)Are you trying this training from login node ?

2)Are you using intel modin or any of the intel's optimized frameworks?

 

And for your information, currently we have few limitations while requesting multiple nodes in devcloud. As a part of these limitations, a user can only run two jobs/nodes  in devcloud.

 

Regarding the examples of multinode python, please find the below github samples.

https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Features-and-Functionality...

https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Features-and-Functionality...

https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Features-and-Functionality...

 

Regards,

Janani Chandran

 

JananiC_Intel
Moderator
824 Views

Hi,


Is your issue resolved? Do you have any update?


Regards,

Janani Chandran



TKJ
Beginner
796 Views

Apologies for the delay:

 

  1. I ran from an interactive node.
  2. Within the last few days, I've seen some information about MODIN, and I'm working to integrate it into my code now, either that or numba.

Are you saying a user can only run 2 jobs on one node? Or that one job can request/use 2 nodes at max?

 

Also, what is the, "reproducer?"

JananiC_Intel
Moderator
765 Views

Hi,

 

Regarding your question on jobs/nodes limitations, currently we allow every user to run two jobs, one job running on two nodes simultaneously or two jobs each running on one node. 

 

Also, what is the, "reproducer?" - In order to reproduce your issue we need a sample script called reproducer containing the steps you tried. This will help us to identify your exact issue.

 

Regards,

Janani Chandran

 

JananiC_Intel
Moderator
699 Views

Hi,


Is your issue resolved? If not, could you share the sample reproducer?


Regards,

Janani Chandran


JananiC_Intel
Moderator
643 Views

Hi,


I assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Janani Chandran


Reply