Solved: How can I increase my job time in DevCloud?

insaf · ‎04-29-2023

I am using DevCloud for my development work, but I am facing issues with the job time limit. My jobs are getting terminated after a certain amount of time, which is affecting my productivity. Is there any way to increase the job time limit in DevCloud? What are the best practices for optimizing job performance to ensure that my jobs complete within the allotted time? Any help or suggestions would be appreciated. Thank you.

Rahila_T_Intel · ‎05-11-2023

Hi,

Intel DevCloud for oneAPI nodes have a CPU time limit of 75 hours (270000 seconds). It is already mentioned in devcloud as below:

########################################################################

# Date: Wed 11 May 2023 03:20:37 AM PDT

# Job ID: ****359.v-qsvr-1.aidevcloud

# User: ******

# Resources: cput=75:00:00,neednodes=1:batch:ppn=2,nodes=1:batch:ppn=2,walltime=06:00:00

########################################################################

This is the reason why you are getting the "PBS: job killed: cput 270056 exceeded limit 270000" error. So the job will get removed from the node.

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue.

Thank you

View solution in original post

Rahila_T_Intel · ‎05-02-2023

Hi,

Thank you for posting in Intel communities.

By default, any jobs will be terminated automatically at the 6h mark. Use the following syntax if your job requires more than 6h to complete:

qsub […] -l walltime=hh:mm:ss

You can extend the walltime limit up to 24hrs.

If you want to extend the default limit to 24 hrs, use

#PBS -l walltime=24:00:00

in your job file.

Regarding your second question, Optimization will depend on your model and use case.

A software AI accelerator can make platforms over 10-100X faster across a variety of applications, models, and use-cases. Please refer https://www.intel.com/content/www/us/en/developer/articles/technical/software-ai-accelerators-ai-performance-boost-for-free.html#gs.wdip66

You can try Maximize TensorFlow* Performance on CPU: Considerations and Recommendations for Inference Workloads. To fully utilize the power of Intel® architecture (IA) for high performance, you can enable TensorFlow* to be powered by Intel’s highly optimized math routines in the Intel® oneAPI Deep Neural Network Library (oneDNN). oneDNN includes convolution, normalization, activation, inner product, and other primitives.

You can try

For reference:

https://www.intel.com/content/www/us/en/developer/articles/technical/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference.html

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue.

Or else, Please provide us the below details to assist you better.

1. Could you please let us know which DevCloud you are using (oneAPI/ Edge/ FPGA)?

2. What kind of application you are trying to run in Intel DevCloud.

Thanks

insaf · ‎05-10-2023

i used this cmd qsub […] -l walltime=hh:mm:ss

but got this error

pyswarms.single.global_best: 5%|▌ |1/20, best_cost=4.25=>> PBS: job killed: cput 270056 exceeded limit 270000

Rahila_T_Intel · ‎05-09-2023

Hi,

We have not heard back from you. Could you please give an update?

Thanks

Rahila_T_Intel · ‎05-11-2023

Hi,

Intel DevCloud for oneAPI nodes have a CPU time limit of 75 hours (270000 seconds). It is already mentioned in devcloud as below:

########################################################################

# Date: Wed 11 May 2023 03:20:37 AM PDT

# Job ID: ****359.v-qsvr-1.aidevcloud

# User: ******

# Resources: cput=75:00:00,neednodes=1:batch:ppn=2,nodes=1:batch:ppn=2,walltime=06:00:00

########################################################################

This is the reason why you are getting the "PBS: job killed: cput 270056 exceeded limit 270000" error. So the job will get removed from the node.

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue.

Thank you

Rahila_T_Intel · ‎05-18-2023

Hi,

We haven't heard back from you. Could you please give us an update?

Thanks

Rahila_T_Intel · ‎05-23-2023

Hi,

Thanks for accepting our solution.

If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Thanks