Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1662 Discussions

How can I increase my job time in DevCloud?

insaf
Beginner
1,180 Views

I am using DevCloud for my development work, but I am facing issues with the job time limit. My jobs are getting terminated after a certain amount of time, which is affecting my productivity. Is there any way to increase the job time limit in DevCloud? What are the best practices for optimizing job performance to ensure that my jobs complete within the allotted time? Any help or suggestions would be appreciated. Thank you.

0 Kudos
1 Solution
Rahila_T_Intel
Moderator
1,019 Views

Hi,


Intel DevCloud for oneAPI nodes have a CPU time limit of 75 hours (270000 seconds). It is already mentioned in devcloud as below:


########################################################################

#   Date:      Wed 11 May 2023 03:20:37 AM PDT

#  Job ID:      ****359.v-qsvr-1.aidevcloud

#   User:      ******

# Resources:      cput=75:00:00,neednodes=1:batch:ppn=2,nodes=1:batch:ppn=2,walltime=06:00:00

########################################################################


This is the reason why you are getting the "PBS: job killed: cput 270056 exceeded limit 270000" error. So the job will get removed from the node. 

 

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. 


Thank you


View solution in original post

6 Replies
Rahila_T_Intel
Moderator
1,128 Views

Hi,

 

Thank you for posting in Intel communities.

 

By default, any jobs will be terminated automatically at the 6h mark. Use the following syntax if your job requires more than 6h to complete:

 

qsub […] -l walltime=hh:mm:ss

 

You can extend the walltime limit up to 24hrs.

 

If you want to extend the default limit to 24 hrs, use 

#PBS -l walltime=24:00:00

in your job file.

 

Regarding your second question, Optimization will depend on your model and use case.

 

A software AI accelerator can make platforms over 10-100X faster across a variety of applications, models, and use-cases. Please refer https://www.intel.com/content/www/us/en/developer/articles/technical/software-ai-accelerators-ai-performance-boost-for-free.html#gs.wdip66

 

You can try Maximize TensorFlow* Performance on CPU: Considerations and Recommendations for Inference Workloads. To fully utilize the power of Intel® architecture (IA) for high performance, you can enable TensorFlow* to be powered by Intel’s highly optimized math routines in the Intel® oneAPI Deep Neural Network Library (oneDNN). oneDNN includes convolution, normalization, activation, inner product, and other primitives. 

 

You can try 

For reference:

https://www.intel.com/content/www/us/en/developer/articles/technical/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference.html

 

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. 

 

Or else, Please provide us the below details to assist you better.

1. Could you please let us know which DevCloud you are using (oneAPI/ Edge/ FPGA)? 

2. What kind of application you are trying to run in Intel DevCloud. 

 

Thanks

 

0 Kudos
insaf
Beginner
1,048 Views

i used this cmd qsub […] -l walltime=hh:mm:ss

but got this error

pyswarms.single.global_best: 5%|▌ |1/20, best_cost=4.25=>> PBS: job killed: cput 270056 exceeded limit 270000

0 Kudos
Rahila_T_Intel
Moderator
1,066 Views

Hi,


We have not heard back from you. Could you please give an update?


Thanks




0 Kudos
Rahila_T_Intel
Moderator
1,020 Views

Hi,


Intel DevCloud for oneAPI nodes have a CPU time limit of 75 hours (270000 seconds). It is already mentioned in devcloud as below:


########################################################################

#   Date:      Wed 11 May 2023 03:20:37 AM PDT

#  Job ID:      ****359.v-qsvr-1.aidevcloud

#   User:      ******

# Resources:      cput=75:00:00,neednodes=1:batch:ppn=2,nodes=1:batch:ppn=2,walltime=06:00:00

########################################################################


This is the reason why you are getting the "PBS: job killed: cput 270056 exceeded limit 270000" error. So the job will get removed from the node. 

 

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. 


Thank you


Rahila_T_Intel
Moderator
934 Views

Hi,


We haven't heard back from you. Could you please give us an update?


Thanks


0 Kudos
Rahila_T_Intel
Moderator
875 Views

Hi,


Thanks for accepting our solution. 

If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks


0 Kudos
Reply