- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, when I use coco data set training model, more than 10000 pictures show that I need to train for more than 100 hours. How can I check whether the training is running on CPU or GPU?How to know the usage rate of GPU At the same time, I use "#PBS -l walltime=24:00:00" in the run.sh file. But I still can't change the time of walltime. What should I do? I look forward to your reply. Thank you!
- Tags:
- General Support
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for reaching us.
We don't have GPU nodes but we have iGPU nodes on DevCloud, for requesting those nodes use the below command in the job script that you are submitting:
#PBS -l nodes=1:gpu
We are sorry to inform you that 24hrs is the max walltime possible in devcloud.
However , you can try the optimizations on CPU itself to get improved performance.
Please follow the below urls for more details on Optimizing Tensorflow workloads on CPU.
- To submit a job in Devcloud
qsub <job_script>.job
- Once job is submitted, you can track the job using the below command:
qstat
- To read the output and error stream of the executing job, you can use the qpeek command as below:
1. qpeek -o <job_id>
2. qpeek -e <job_id>
Please note that an output and error file will be created once the execution is completed.
Hope this clarifies your query. Please feel free to reach out to us if you have any further queries. Thank You.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
okay, thank you. I also have a question, whether to use #PBS -l nodes=1:gpu to calculate on the igpu node, why does it feel like the speed of running it directly on the jupyter notebook without setting CPU / GPU? What's more, can I speed up the operation by changing the number of nodes? Can I know more about the speed of igpu? Thank you for your reply!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
#PBS -l nodes=1:gpu is requesting an iGPU node,hence your code will be running in iGPU.
You can try to optimize your code and increase the speedup by tweaking the OMP/KMP parameters for improving perfomance.
export OMP_NUM_THREADS="6"
export KMP_BLOCKTIME="0"
export KMP_SETTINGS="1"
export KMP_AFFINITY="granularity=fine,verbose,compact,1,0"
Refer:
Increasing the number of nodes may or maynot increase the speed based on your application.You could give a try running in a distributed way and see if that works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please confirm whether the solution provided is helped for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After improvement, the speed is a little faster, but the problem of training time-out still hasn't been solved. I will improve the code again. Thank you. The topic can be closed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the confirmation. We are closing the case.
Please raise a new thread if you have further issues.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page