Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1218 Discussions

Cannot get interactive node

Robert_C_Intel
Employee
1,122 Views

I cannot get any interactive nodes. It has been like this for a few days. I think it is related to having 2 active jobs that cannot be killed:

 

XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !844
qsub -I -l nodes=1:ppn=2 -d .
qsub: waiting for job 1843214.v-qsvr-1.aidevcloud to start
^CDo you wish to terminate the job and exit (y|[n])? y
Job 1843214.v-qsvr-1.aidevcloud is being deleted
XXXXXX@login-2:~$ qselect | xargs qdel
qdel: Server could not connect to MOM 1840347.v-qsvr-1.aidevcloud
qdel: Server could not connect to MOM 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ qselect | xargs qdel -p
qdel: Unauthorized Request 1840347.v-qsvr-1.aidevcloud
qdel: Unauthorized Request 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ Connection to ssh.devcloud.intel.com closed by remote host.
Connection to devcloud closed by remote host.
Connection to devcloud closed.
rscohn1@rscohn1-mobl1:~$

 

Labels (1)
0 Kudos
1 Solution
JananiC_Intel
Moderator
1,084 Views

Hi,


Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?


Regards,

Janani Chandran


View solution in original post

13 Replies
GRN2
Employee
1,110 Views

Yes, you can run only two jobs. All other jobs will be in queue.

Robert_C_Intel
Employee
1,107 Views

How do I kill the jobs? qdel won't let me do it.

 

 

GRN2
Employee
1,101 Views

Close terminals with interactive jobs.

Robert_C_Intel
Employee
1,098 Views

There aren't any open terminals on my side. Does someone have administrative privileges and can manually delete the jobs?

JananiC_Intel
Moderator
1,085 Views

Hi,


Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?


Regards,

Janani Chandran


Robert_C_Intel
Employee
1,078 Views
XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !
JananiC_Intel
Moderator
1,060 Views

Hi,


Thanks for the immediate response. For better understanding, we would like to know what workload or what type of workload you were running in devcloud that caused these issues.


Regards,

Janani Chandran


Robert_C_Intel
Employee
1,052 Views

Thanks for clearing the jobs.

 

I am not sure exactly what is the cause. I was having a problem where mpi jobs would hang in MPI_Init, and 'kill -9' would not clean up the processes. If it happens again I will let you know.

Robert_C_Intel
Employee
1,041 Views

@JananiC_Intel : I can reproduce the actions that causes a job to hang in the queue. MPI hangs on quad_gpu systems. Normally, I can abort with control-c. If I run the program with:

I_MPI_DEBUG=3 ./a.out

 

Then I cannot abort with control-c.  Killing the qsub process on the login node closes the connection, but leaves a job in the queue.

 

More info about the MPI problem is here: https://community.intel.com/t5/Intel-DevCloud/MPI-Init-hangs-on-s012-n004/m-p/1357160/emcs_t/S2h8ZW1...

 

Can you kill:

1845256.v-qsvr-1

For me? I cannot kill it with qdel.

 

JananiC_Intel
Moderator
1,019 Views

Hi,


Thanks for the update. We will address your issue at the earliest. Meanwhile could you try the below command for job deletion and let us know?


qdel <job-ID>


or


qdel all


Regards,

Janani Chandran


JananiC_Intel
Moderator
992 Views

Hi,


Is your issue resolved? Did you try that command?


Regards,

Janani Chandran


Robert_C_Intel
Employee
986 Views

Hi Janani,

 

I know about the qdel command. When I report it here, it is because I have a job that cannot be killed with qdel and at that point the solution is to post it here so that someone can request the engineering team to kill the job.

 

The unwanted jobs are all gone now. Thanks.

 

Robert

JananiC_Intel
Moderator
963 Views

Hi,


Sorry for the inconvenience.


Our Devcloud team was working on your issue. Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Janani Chandran


Reply