- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I cannot get any interactive nodes. It has been like this for a few days. I think it is related to having 2 active jobs that cannot be killed:
XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !844
qsub -I -l nodes=1:ppn=2 -d .
qsub: waiting for job 1843214.v-qsvr-1.aidevcloud to start
^CDo you wish to terminate the job and exit (y|[n])? y
Job 1843214.v-qsvr-1.aidevcloud is being deleted
XXXXXX@login-2:~$ qselect | xargs qdel
qdel: Server could not connect to MOM 1840347.v-qsvr-1.aidevcloud
qdel: Server could not connect to MOM 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ qselect | xargs qdel -p
qdel: Unauthorized Request 1840347.v-qsvr-1.aidevcloud
qdel: Unauthorized Request 1841484.v-qsvr-1.aidevcloud
XXXXXX@login-2:~$ Connection to ssh.devcloud.intel.com closed by remote host.
Connection to devcloud closed by remote host.
Connection to devcloud closed.
rscohn1@rscohn1-mobl1:~$
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?
Regards,
Janani Chandran
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you can run only two jobs. All other jobs will be in queue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How do I kill the jobs? qdel won't let me do it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Close terminals with interactive jobs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There aren't any open terminals on my side. Does someone have administrative privileges and can manually delete the jobs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities. Could you share the job id which needs to be killed?
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
XXXXXX@login-2:~$ qstat
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1840347.v-qsvr-1 STDIN XXXXXX 00:00:04 R batch
1841484.v-qsvr-1 STDIN XXXXXX 00:00:05 R batch
XXXXXX@login-2:~$ !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the immediate response. For better understanding, we would like to know what workload or what type of workload you were running in devcloud that caused these issues.
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for clearing the jobs.
I am not sure exactly what is the cause. I was having a problem where mpi jobs would hang in MPI_Init, and 'kill -9' would not clean up the processes. If it happens again I will let you know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@JananiC_Intel : I can reproduce the actions that causes a job to hang in the queue. MPI hangs on quad_gpu systems. Normally, I can abort with control-c. If I run the program with:
I_MPI_DEBUG=3 ./a.out
Then I cannot abort with control-c. Killing the qsub process on the login node closes the connection, but leaves a job in the queue.
More info about the MPI problem is here: https://community.intel.com/t5/Intel-DevCloud/MPI-Init-hangs-on-s012-n004/m-p/1357160/emcs_t/S2h8ZW1...
Can you kill:
1845256.v-qsvr-1
For me? I cannot kill it with qdel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the update. We will address your issue at the earliest. Meanwhile could you try the below command for job deletion and let us know?
qdel <job-ID>
or
qdel all
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is your issue resolved? Did you try that command?
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Janani,
I know about the qdel command. When I report it here, it is because I have a job that cannot be killed with qdel and at that point the solution is to post it here so that someone can request the engineering team to kill the job.
The unwanted jobs are all gone now. Thanks.
Robert
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Sorry for the inconvenience.
Our Devcloud team was working on your issue. Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Regards,
Janani Chandran

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page