Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
1275 Discussions

Cannot get interactive node

Robert_C_Intel
Employee
1,204 Views

I cannot get any interactive nodes. It has been like this for a few days. I think it is related to having 2 active jobs that cannot be killed:

 

XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !844
qsub -I -l nodes=1:ppn=2 -d .
qsub: waiting for job 1843214.v-qsvr-1.aidevcloud to start
^CDo you wish to terminate the job and exit (y|[n])? y
Job 1843214.v-qsvr-1.aidevcloud is being deleted
XXXXXX@login-2:~$ qselect | xargs qdel
qdel: Server could not connect to MOM 1840347.v-qsvr-1.aidevcloud
qdel: Server could not connect to MOM 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ qselect | xargs qdel -p
qdel: Unauthorized Request 1840347.v-qsvr-1.aidevcloud
qdel: Unauthorized Request 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ Connection to ssh.devcloud.intel.com closed by remote host.
Connection to devcloud closed by remote host.
Connection to devcloud closed.
rscohn1@rscohn1-mobl1:~$

 

Labels (1)
0 Kudos
1 Solution
JananiC_Intel
Moderator
1,166 Views

Hi,


Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?


Regards,

Janani Chandran


View solution in original post

13 Replies
GRN2
Employee
1,192 Views

Yes, you can run only two jobs. All other jobs will be in queue.

Robert_C_Intel
Employee
1,189 Views

How do I kill the jobs? qdel won't let me do it.

 

 

GRN2
Employee
1,183 Views

Close terminals with interactive jobs.

Robert_C_Intel
Employee
1,180 Views

There aren't any open terminals on my side. Does someone have administrative privileges and can manually delete the jobs?

JananiC_Intel
Moderator
1,167 Views

Hi,


Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?


Regards,

Janani Chandran


Robert_C_Intel
Employee
1,160 Views
XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !
JananiC_Intel
Moderator
1,142 Views

Hi,


Thanks for the immediate response. For better understanding, we would like to know what workload or what type of workload you were running in devcloud that caused these issues.


Regards,

Janani Chandran


Robert_C_Intel
Employee
1,134 Views

Thanks for clearing the jobs.

 

I am not sure exactly what is the cause. I was having a problem where mpi jobs would hang in MPI_Init, and 'kill -9' would not clean up the processes. If it happens again I will let you know.

Robert_C_Intel
Employee
1,123 Views

@JananiC_Intel : I can reproduce the actions that causes a job to hang in the queue. MPI hangs on quad_gpu systems. Normally, I can abort with control-c. If I run the program with:

I_MPI_DEBUG=3 ./a.out

 

Then I cannot abort with control-c.  Killing the qsub process on the login node closes the connection, but leaves a job in the queue.

 

More info about the MPI problem is here: https://community.intel.com/t5/Intel-DevCloud/MPI-Init-hangs-on-s012-n004/m-p/1357160/emcs_t/S2h8ZW1...

 

Can you kill:

1845256.v-qsvr-1

For me? I cannot kill it with qdel.

 

JananiC_Intel
Moderator
1,101 Views

Hi,


Thanks for the update. We will address your issue at the earliest. Meanwhile could you try the below command for job deletion and let us know?


qdel <job-ID>


or


qdel all


Regards,

Janani Chandran


JananiC_Intel
Moderator
1,074 Views

Hi,


Is your issue resolved? Did you try that command?


Regards,

Janani Chandran


Robert_C_Intel
Employee
1,068 Views

Hi Janani,

 

I know about the qdel command. When I report it here, it is because I have a job that cannot be killed with qdel and at that point the solution is to post it here so that someone can request the engineering team to kill the job.

 

The unwanted jobs are all gone now. Thanks.

 

Robert

JananiC_Intel
Moderator
1,045 Views

Hi,


Sorry for the inconvenience.


Our Devcloud team was working on your issue. Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Janani Chandran


Reply