- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am training a model using the 2014 NYC-Taxi dataset for comparison against a previous experiment using an NVidia server. However, my runs exit with a killed command, which seems to be due to memory use. Because of this, I tried to find a way to distribute the training across multiple nodes with MODIN, but can't find any examples for this on the devcloud.
- Is there a way to confirm/optimize the memory use for my application, so that I don't have to go multinode?
- Are there any examples of multinode python on devcloud?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel forums.
Could you share the sample reproducer? Also kindly let us know the below.
1)Are you trying this training from login node ?
2)Are you using intel modin or any of the intel's optimized frameworks?
And for your information, currently we have few limitations while requesting multiple nodes in devcloud. As a part of these limitations, a user can only run two jobs/nodes in devcloud.
Regarding the examples of multinode python, please find the below github samples.
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is your issue resolved? Do you have any update?
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apologies for the delay:
- I ran from an interactive node.
- Within the last few days, I've seen some information about MODIN, and I'm working to integrate it into my code now, either that or numba.
Are you saying a user can only run 2 jobs on one node? Or that one job can request/use 2 nodes at max?
Also, what is the, "reproducer?"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Regarding your question on jobs/nodes limitations, currently we allow every user to run two jobs, one job running on two nodes simultaneously or two jobs each running on one node.
Also, what is the, "reproducer?" - In order to reproduce your issue we need a sample script called reproducer containing the steps you tried. This will help us to identify your exact issue.
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is your issue resolved? If not, could you share the sample reproducer?
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Regards,
Janani Chandran
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page