- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using DevCloud for my development work, but I am facing issues with the job time limit. My jobs are getting terminated after a certain amount of time, which is affecting my productivity. Is there any way to increase the job time limit in DevCloud? What are the best practices for optimizing job performance to ensure that my jobs complete within the allotted time? Any help or suggestions would be appreciated. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Intel DevCloud for oneAPI nodes have a CPU time limit of 75 hours (270000 seconds). It is already mentioned in devcloud as below:
########################################################################
# Date: Wed 11 May 2023 03:20:37 AM PDT
# Job ID: ****359.v-qsvr-1.aidevcloud
# User: ******
# Resources: cput=75:00:00,neednodes=1:batch:ppn=2,nodes=1:batch:ppn=2,walltime=06:00:00
########################################################################
This is the reason why you are getting the "PBS: job killed: cput 270056 exceeded limit 270000" error. So the job will get removed from the node.
If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue.
Thank you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel communities.
By default, any jobs will be terminated automatically at the 6h mark. Use the following syntax if your job requires more than 6h to complete:
qsub […] -l walltime=hh:mm:ss
You can extend the walltime limit up to 24hrs.
If you want to extend the default limit to 24 hrs, use
#PBS -l walltime=24:00:00
in your job file.
Regarding your second question, Optimization will depend on your model and use case.
A software AI accelerator can make platforms over 10-100X faster across a variety of applications, models, and use-cases. Please refer https://www.intel.com/content/www/us/en/developer/articles/technical/software-ai-accelerators-ai-performance-boost-for-free.html#gs.wdip66
You can try Maximize TensorFlow* Performance on CPU: Considerations and Recommendations for Inference Workloads. To fully utilize the power of Intel® architecture (IA) for high performance, you can enable TensorFlow* to be powered by Intel’s highly optimized math routines in the Intel® oneAPI Deep Neural Network Library (oneDNN). oneDNN includes convolution, normalization, activation, inner product, and other primitives.
You can try
- Non-uniform memory access (NUMA) Controls Affecting Performance
- OpenMP Technical Performance Considerations for Intel® Optimization for TensorFlow
- Enable Mixed Precision
For reference:
If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue.
Or else, Please provide us the below details to assist you better.
1. Could you please let us know which DevCloud you are using (oneAPI/ Edge/ FPGA)?
2. What kind of application you are trying to run in Intel DevCloud.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i used this cmd qsub […] -l walltime=hh:mm:ss
but got this error
pyswarms.single.global_best: 5%|▌ |1/20, best_cost=4.25=>> PBS: job killed: cput 270056 exceeded limit 270000
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please give an update?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Intel DevCloud for oneAPI nodes have a CPU time limit of 75 hours (270000 seconds). It is already mentioned in devcloud as below:
########################################################################
# Date: Wed 11 May 2023 03:20:37 AM PDT
# Job ID: ****359.v-qsvr-1.aidevcloud
# User: ******
# Resources: cput=75:00:00,neednodes=1:batch:ppn=2,nodes=1:batch:ppn=2,walltime=06:00:00
########################################################################
This is the reason why you are getting the "PBS: job killed: cput 270056 exceeded limit 270000" error. So the job will get removed from the node.
If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you. Could you please give us an update?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for accepting our solution.
If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page