I have a frozen job running in DevCloud for 81 hours.
Apparently it was submitted to node s005-n005, but this node does not appear in the output of the pbsnodes command
I try to kill the job with:
but I get this response:
qdel: Server could not connect to MOM 18216.v-qsvr-fpga.aidevcloud
What can I do?
My job is still frozen after 100 hours (4 days)
I've seen others having similar problems in the past.
As you can see, (https://helpful.knobs-dials.com/index.php/PBS_notes#qdel:_Server_could_not_connect_to_MOM) this is a known problem of PBS . This makes sense, since the node that is assigned to my job does not appear in the pbsnodes output anymore.
It looks like (with admin privileges) it could be simply solved by just executing
qdel -p 18216.v-qsvr-fpga.aidevcloud
Is there any sys-admin that can execute this command for me??
Thank you for posting your query in Intel Devcloud.
Sorry for the inconvenience. From your log we could understand that you are working in FPGA devcloud. We’ve a dedicated team to handle FPGA related issues. We’re forwarding this query to that team for a faster response. This forum handles queries and issues related to OneAPI devcloud.
I appreciate very much your answer.
In fact, the only relation of the issue with FPGAs is that the PBS queue is targetting a node with FPGA support.
The problem is with the PBS software.
I also posted to "Application Acceleration With FPGAs" group, and already in this group "
I have forwarded your issue to the owner of this Dev Cloud platform and awaiting to hear back. I would request for them to answer to your post directly. Please give us a couple of days on this.
I 'm sorry to insist, but do you have any news @Hazlina_R_Intel ?
231 hours, and going...
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
18216.v-qsvr-fpga.aide u57927 batch build_stratix10. 116536 1 2 -- 06:00:00 R 231:41:49
The platform owner has been responding to you on the other thread that you had. He also requested that you sent a direct email to this in the case you did not get a response: firstname.lastname@example.org
I know you mentioned that you have sent a couple of emails to that inbox, please do follow-up from there. This inbox is the direct inbox for FPGA Dev Cloud platform support. Sorry for the delay here. The platform owner is trying their best to resolve this issue.