- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been training a tensorflow classification model on DevCloud using: qsub -l nodes=1:gpu:ppn=2 -l walltime=24:00:00
The model is fitted with the argumnets: workers=8, use_multiprocessing=True and also with an early stopping.
I've been following the training process and upon reaching early stopping, the node starts to hang.
I ran the script on a different computational server and in seems to run with no problem.
Any idea what might cause it and how to fix this problem?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel forums.
We tried your command and we were not able to enter the computational node.Try the below command and let us know the update.
qsub -I -l nodes=1:gpu:ppn=2 -l walltime=24:00:00
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is the issue resolved?Do you have any update?
Regards,
Janani Chnadran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are assuming that the solution provided helped and would no longer be monitoring this issue.Please raise a new thread if you have any further issues.
Regards,
Janani Chandran

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page