Intel® High Level Design
Support for Intel® High Level Synthesis Compiler, DSP Builder, OneAPI for Intel® FPGAs, Intel® FPGA SDK for OpenCL™
661 Discussions

Frozen Job in DevCloud

davidcastells
New Contributor I
1,199 Views

I have a frozen job running in DevCloud for 81 hours.

Apparently it was submitted to node s005-n005, but this node does not appear in the output of the pbsnodes command

I try to kill the job with:

qdel 18216.v-qsvr-fpga.aidevcloud

but I get this response:
qdel: Server could not connect to MOM 18216.v-qsvr-fpga.aidevcloud

 

What can I do?

0 Kudos
7 Replies
davidcastells
New Contributor I
1,174 Views

My job is still frozen after 100 hours (4 days)

I've seen others having similar problems in the past.

As you can see, (https://helpful.knobs-dials.com/index.php/PBS_notes#qdel:_Server_could_not_connect_to_MOM) this is a known problem of PBS . This makes sense, since the node that is assigned to my job does not appear in the pbsnodes output anymore.

It looks like (with admin privileges) it could be simply solved by just executing 

  qdel -p 18216.v-qsvr-fpga.aidevcloud

Is there any sys-admin that can execute this command for me?? 

Anyone can help me? @Gopika_Intel ? @RaeesaM_Intel ? @AnilErinch_A_Intel ?

0 Kudos
Gopika_Intel
Moderator
1,165 Views

Hi,

 

Thank you for posting your query in Intel Devcloud.

 

Sorry for the inconvenience. From your log we could understand that you are working in FPGA devcloud. We’ve a dedicated team to handle FPGA related issues. We’re forwarding this query to that team for a faster response. This forum handles queries and issues related to OneAPI devcloud.

 

Regards

Gopika


0 Kudos
davidcastells
New Contributor I
1,154 Views

I appreciate very much your answer.

In fact, the only relation of the issue with FPGAs is that the PBS queue is targetting a node with FPGA support.

The problem is with the PBS software. 

I also posted to "Application Acceleration With FPGAs" group, and already in this group "Intel High Level Design" some days ago, but noone answered.

There is an answer from @AnilErinch_A_Intel  to the same question from @Ziaul  some time ago, but he is referring to a broken link  

0 Kudos
Hazlina_R_Intel
Moderator
1,142 Views

Hi,

I have forwarded your issue to the owner of this Dev Cloud platform and awaiting to hear back. I would request for them to answer to your post directly. Please give us a couple of days on this.


-Hazlina


0 Kudos
davidcastells
New Contributor I
1,141 Views
0 Kudos
davidcastells
New Contributor I
1,122 Views

I 'm sorry to insist, but do you have any news @Hazlina_R_Intel ?

231 hours, and going...

 

v-qsvr-fpga.aidevcloud:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
18216.v-qsvr-fpga.aide u57927 batch build_stratix10. 116536 1 2 -- 06:00:00 R 231:41:49

 

0 Kudos
Hazlina_R_Intel
Moderator
1,115 Views

Hi David,

The platform owner has been responding to you on the other thread that you had. He also requested that you sent a direct email to this in the case you did not get a response:  fpgauniversity@intel.com


I know you mentioned that you have sent a couple of emails to that inbox, please do follow-up from there. This inbox is the direct inbox for FPGA Dev Cloud platform support. Sorry for the delay here. The platform owner is trying their best to resolve this issue.


-Hazlina


0 Kudos
Reply