Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1638 Discussions

Cannot get interactive node

Robert_C_Intel
Employee
2,351 Views

I cannot get any interactive nodes. It has been like this for a few days. I think it is related to having 2 active jobs that cannot be killed:

 

XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !844
qsub -I -l nodes=1:ppn=2 -d .
qsub: waiting for job 1843214.v-qsvr-1.aidevcloud to start
^CDo you wish to terminate the job and exit (y|[n])? y
Job 1843214.v-qsvr-1.aidevcloud is being deleted
XXXXXX@login-2:~$ qselect | xargs qdel
qdel: Server could not connect to MOM 1840347.v-qsvr-1.aidevcloud
qdel: Server could not connect to MOM 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ qselect | xargs qdel -p
qdel: Unauthorized Request 1840347.v-qsvr-1.aidevcloud
qdel: Unauthorized Request 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ Connection to ssh.devcloud.intel.com closed by remote host.
Connection to devcloud closed by remote host.
Connection to devcloud closed.
rscohn1@rscohn1-mobl1:~$

 

Labels (1)
0 Kudos
1 Solution
JananiC_Intel
Moderator
2,313 Views

Hi,


Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?


Regards,

Janani Chandran


View solution in original post

0 Kudos
13 Replies
GRN2
Employee
2,339 Views

Yes, you can run only two jobs. All other jobs will be in queue.

0 Kudos
Robert_C_Intel
Employee
2,336 Views

How do I kill the jobs? qdel won't let me do it.

 

 

0 Kudos
GRN2
Employee
2,330 Views

Close terminals with interactive jobs.

0 Kudos
Robert_C_Intel
Employee
2,327 Views

There aren't any open terminals on my side. Does someone have administrative privileges and can manually delete the jobs?

0 Kudos
JananiC_Intel
Moderator
2,314 Views

Hi,


Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?


Regards,

Janani Chandran


0 Kudos
Robert_C_Intel
Employee
2,307 Views
XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !
0 Kudos
JananiC_Intel
Moderator
2,289 Views

Hi,


Thanks for the immediate response. For better understanding, we would like to know what workload or what type of workload you were running in devcloud that caused these issues.


Regards,

Janani Chandran


0 Kudos
Robert_C_Intel
Employee
2,281 Views

Thanks for clearing the jobs.

 

I am not sure exactly what is the cause. I was having a problem where mpi jobs would hang in MPI_Init, and 'kill -9' would not clean up the processes. If it happens again I will let you know.

0 Kudos
Robert_C_Intel
Employee
2,270 Views

@JananiC_Intel : I can reproduce the actions that causes a job to hang in the queue. MPI hangs on quad_gpu systems. Normally, I can abort with control-c. If I run the program with:

I_MPI_DEBUG=3 ./a.out

 

Then I cannot abort with control-c.  Killing the qsub process on the login node closes the connection, but leaves a job in the queue.

 

More info about the MPI problem is here: https://community.intel.com/t5/Intel-DevCloud/MPI-Init-hangs-on-s012-n004/m-p/1357160/emcs_t/S2h8ZW1haWx8dG9waWNfc3Vic2NyaXB0aW9ufEtaN0RNWkRWUVhPQzUzfDEzNTcxNjB8U1VCU0NSSVBUSU9OU3xoSw#M4259

 

Can you kill:

1845256.v-qsvr-1

For me? I cannot kill it with qdel.

 

0 Kudos
JananiC_Intel
Moderator
2,248 Views

Hi,


Thanks for the update. We will address your issue at the earliest. Meanwhile could you try the below command for job deletion and let us know?


qdel <job-ID>


or


qdel all


Regards,

Janani Chandran


0 Kudos
JananiC_Intel
Moderator
2,221 Views

Hi,


Is your issue resolved? Did you try that command?


Regards,

Janani Chandran


0 Kudos
Robert_C_Intel
Employee
2,215 Views

Hi Janani,

 

I know about the qdel command. When I report it here, it is because I have a job that cannot be killed with qdel and at that point the solution is to post it here so that someone can request the engineering team to kill the job.

 

The unwanted jobs are all gone now. Thanks.

 

Robert

0 Kudos
JananiC_Intel
Moderator
2,192 Views

Hi,


Sorry for the inconvenience.


Our Devcloud team was working on your issue. Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Janani Chandran


0 Kudos
Reply