Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Beginner
101 Views

How can I check whether GPU is used in training and the usage rate of GPU?

Hello, when I use coco data set training model, more than 10000 pictures show that I need to train for more than 100 hours. How can I check whether the training is running on CPU or GPU?How to know the usage rate of GPU At the same time, I use  "#PBS -l walltime=24:00:00" in the run.sh file. But I still can't change the time of walltime. What should I do? I look forward to your reply. Thank you!

Tags (1)
0 Kudos
6 Replies
101 Views

Hi,

Thank you for reaching us.

We don't have GPU nodes but we have iGPU nodes on DevCloud, for requesting those nodes use the below command in the job script that you are submitting:

#PBS -l nodes=1:gpu

We are sorry to inform you that 24hrs is the max walltime possible in devcloud.

However , you can try the optimizations on CPU  itself to get improved performance.

Please follow the below urls for more details on Optimizing Tensorflow workloads on CPU.

https://software.intel.com/en-us/articles/maximize-tensorflow-performance-on-cpu-considerations-and-...

https://software.intel.com/en-us/articles/tips-to-improve-performance-for-popular-deep-learning-fram...

  • To submit a job in Devcloud

               qsub <job_script>.job

  • Once job is submitted, you can track the job using the below command:  

             qstat

  • To read the output and error stream of the executing job, you can use the qpeek command as below:

           1. qpeek -o <job_id>

           2. qpeek -e <job_id>

Please note that an output and error file will be created once the execution is completed.

Hope this clarifies your query. Please feel free to reach out to us if you have any further queries. Thank You.

0 Kudos
Beginner
101 Views

okay, thank you. I also have a question, whether to use #PBS -l nodes=1:gpu to calculate on the igpu node, why does it feel like the speed of running it directly on the jupyter notebook without setting CPU / GPU? What's more, can I speed up the operation by changing the number of nodes? Can I know more about the speed of igpu?  Thank you for your reply!

0 Kudos
101 Views

Hi,

#PBS -l nodes=1:gpu is requesting an iGPU node,hence your code will be running in iGPU.

You can try to optimize your code and increase the speedup by tweaking  the OMP/KMP parameters for improving perfomance.

export OMP_NUM_THREADS="6"

export KMP_BLOCKTIME="0"

export KMP_SETTINGS="1"

export KMP_AFFINITY="granularity=fine,verbose,compact,1,0"

Refer:

https://software.intel.com/en-us/articles/tips-to-improve-performance-for-popular-deep-learning-fram...

Increasing the number of nodes may or maynot increase the speed based on your application.You could give a try running in a distributed way and see if that works.

0 Kudos
Moderator
101 Views

Hi,

Could you please confirm whether the solution provided is helped for you.

0 Kudos
Beginner
101 Views

After improvement, the speed is a little faster, but the problem of training time-out still hasn't been solved. I will improve the code again. Thank you. The topic can be closed.

0 Kudos
Moderator
101 Views

Hi,

Thanks for the confirmation. We are closing the case.

Please raise a new thread if you have further issues.

0 Kudos