Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud

s001-n084 terminates early/randomly

PSath2
Beginner
1,096 Views

s001-n084 terminates interactive sessions randomly. Sometimes I can run for a few minutes, sometimes it terminates the connection right after login.

I don't have any other jobs in any other queues, but keep landing on n084 when I request an arria10 in the default queue.

0 Kudos
7 Replies
PSath2
Beginner
1,096 Views

Interactive jobs also occasionally get "apparently deleted" while waiting in queue "v-qsvr-1.aidevcloud"

@login-1:~$ qsub -V -l nodes=ppn=2:arria10  -I
qsub: waiting for job 435172.v-qsvr-1.aidevcloud to start
qsub: job 435172.v-qsvr-1.aidevcloud apparently deleted

 

0 Kudos
PSath2
Beginner
1,096 Views

Seems like this might be specific to node 84, not the whole queue. If I manually select node 91, which is in the same queue and also has the arria10 property, my interactive build job still hasn't been terminated after ~40 minutes

0 Kudos
JEYANTHKRI_N_Intel
1,096 Views

Hi ,

Thanks for reaching out to us.

We will contact the DevCloud admin team and get back to you .

 

0 Kudos
JEYANTHKRI_N_Intel
1,096 Views

Hi,

Apologies for the delayed response. Our DevCloud Admin team needs some more time to debug the problem.Will get back to you as soon as we get a response from them.

 

0 Kudos
Andrey_Vladimirov
New Contributor III
1,096 Views

Sathre, Paul wrote:

Interactive jobs also occasionally get "apparently deleted" while waiting in queue "v-qsvr-1.aidevcloud"

@login-1:~$ qsub -V -l nodes=ppn=2:arria10  -I
qsub: waiting for job 435172.v-qsvr-1.aidevcloud to start
qsub: job 435172.v-qsvr-1.aidevcloud apparently deleted

 

 

Hi Paul, you need to change the -l argument like this: "-l nodes=1:ppn=2:arria10". Without it, the job will reserve only half a node and will get terminated by the cloud's automation. You always need the "1:" when requesting a node by its property. However, when you are requesting a node by its hostname, you should not use the "1:", e.g., "-l nodes=s001-n084:ppn=2".

0 Kudos
JEYANTHKRI_N_Intel
1,096 Views

Hi,

Could you please let us know if the solution provided helped?

0 Kudos
STyur
New Contributor I
1,096 Views

Best way for simple use is to add the follow aliases to ~/.bashrc

alias cifpga='qsub -I -l nodes=1:fpga_compile:ppn=2 -d .'
alias cbfpga='qsub -l nodes=1:fpga_compile:ppn=2,walltime=24:00:00 -d .' 
alias rifpga='qsub -I -l nodes=1:fpga_runtime:ppn=2 -d .'
alias rbfpga='qsub -l nodes=1:fpga_runtime:ppn=2,walltime=24:00:00 -d .'

And after that you can connect interactivle or send a job by one simple commad:

interactive ligin to compilation server:

cifpga

send compilation job:

cbfpga <you script file>

interactive ligin to fpga server:

rifpga

send fpga job:

rbfpga <you script file>

 

0 Kudos
Reply