Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1627 Discussions

Cannot get interactive node

Robert_C_Intel
Employee
2,106 Views

I cannot get any interactive nodes. It has been like this for a few days. I think it is related to having 2 active jobs that cannot be killed:

 

XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !844
qsub -I -l nodes=1:ppn=2 -d .
qsub: waiting for job 1843214.v-qsvr-1.aidevcloud to start
^CDo you wish to terminate the job and exit (y|[n])? y
Job 1843214.v-qsvr-1.aidevcloud is being deleted
XXXXXX@login-2:~$ qselect | xargs qdel
qdel: Server could not connect to MOM 1840347.v-qsvr-1.aidevcloud
qdel: Server could not connect to MOM 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ qselect | xargs qdel -p
qdel: Unauthorized Request 1840347.v-qsvr-1.aidevcloud
qdel: Unauthorized Request 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ Connection to ssh.devcloud.intel.com closed by remote host.
Connection to devcloud closed by remote host.
Connection to devcloud closed.
rscohn1@rscohn1-mobl1:~$

 

Labels (1)
0 Kudos
1 Solution
JananiC_Intel
Moderator
2,068 Views

Hi,


Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?


Regards,

Janani Chandran


View solution in original post

0 Kudos
13 Replies
GRN2
Employee
2,094 Views

Yes, you can run only two jobs. All other jobs will be in queue.

0 Kudos
Robert_C_Intel
Employee
2,091 Views

How do I kill the jobs? qdel won't let me do it.

 

 

0 Kudos
GRN2
Employee
2,085 Views

Close terminals with interactive jobs.

0 Kudos
Robert_C_Intel
Employee
2,082 Views

There aren't any open terminals on my side. Does someone have administrative privileges and can manually delete the jobs?

0 Kudos
JananiC_Intel
Moderator
2,069 Views

Hi,


Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?


Regards,

Janani Chandran


0 Kudos
Robert_C_Intel
Employee
2,062 Views
XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !
0 Kudos
JananiC_Intel
Moderator
2,044 Views

Hi,


Thanks for the immediate response. For better understanding, we would like to know what workload or what type of workload you were running in devcloud that caused these issues.


Regards,

Janani Chandran


0 Kudos
Robert_C_Intel
Employee
2,036 Views

Thanks for clearing the jobs.

 

I am not sure exactly what is the cause. I was having a problem where mpi jobs would hang in MPI_Init, and 'kill -9' would not clean up the processes. If it happens again I will let you know.

0 Kudos
Robert_C_Intel
Employee
2,025 Views

@JananiC_Intel : I can reproduce the actions that causes a job to hang in the queue. MPI hangs on quad_gpu systems. Normally, I can abort with control-c. If I run the program with:

I_MPI_DEBUG=3 ./a.out

 

Then I cannot abort with control-c.  Killing the qsub process on the login node closes the connection, but leaves a job in the queue.

 

More info about the MPI problem is here: https://community.intel.com/t5/Intel-DevCloud/MPI-Init-hangs-on-s012-n004/m-p/1357160/emcs_t/S2h8ZW1haWx8dG9waWNfc3Vic2NyaXB0aW9ufEtaN0RNWkRWUVhPQzUzfDEzNTcxNjB8U1VCU0NSSVBUSU9OU3xoSw#M4259

 

Can you kill:

1845256.v-qsvr-1

For me? I cannot kill it with qdel.

 

0 Kudos
JananiC_Intel
Moderator
2,003 Views

Hi,


Thanks for the update. We will address your issue at the earliest. Meanwhile could you try the below command for job deletion and let us know?


qdel <job-ID>


or


qdel all


Regards,

Janani Chandran


0 Kudos
JananiC_Intel
Moderator
1,976 Views

Hi,


Is your issue resolved? Did you try that command?


Regards,

Janani Chandran


0 Kudos
Robert_C_Intel
Employee
1,970 Views

Hi Janani,

 

I know about the qdel command. When I report it here, it is because I have a job that cannot be killed with qdel and at that point the solution is to post it here so that someone can request the engineering team to kill the job.

 

The unwanted jobs are all gone now. Thanks.

 

Robert

0 Kudos
JananiC_Intel
Moderator
1,947 Views

Hi,


Sorry for the inconvenience.


Our Devcloud team was working on your issue. Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Janani Chandran


0 Kudos
Reply