Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, DLA, Software Stack, and Reference Designs
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.

Frozen Job in Devcloud

davidcastells
New Contributor I
558 Views

I have a frozen job in DevCloud. Time quota was 6 hours, but it's been running for more than 62 hours.

I try to kill it with

qdel <job id>

but I get 

qdel: Server could not connect to MOM <job id>

 

Any idea on what to do ?

0 Kudos
1 Solution
davidcastells
New Contributor I
406 Views

Let me add (for others having the same problem) that the DevCloud team finally cancelled my pending job.

A general good advice is to always include a deadline in your batch jobs to avoid any issue with the queueing system in case something strange happen. 

View solution in original post

6 Replies
Hazlina_R_Intel
Moderator
543 Views

Hi,

I have forwarded your issue to the owner of this Dev Cloud platform and awaiting to hear back. I would request for them to answer to your post directly. Please give us a couple of days on this.


-Hazlina


Lawrence_L_Intel
Employee
536 Views

Do you know which server you launched the job from? If so, you can log back into the same server, you can try ps -auxw and kill -9 the job ID. Sometimes that kills the job. Make sure you use the walltime construct in batch mode so you don't time out i the future.

Thanks,

Larry

Lawrence_L_Intel
Employee
533 Views

Let me add if you post here and dont see a response, try fpgauniversity@intel.com . We have a fairly small team moderating technical inquiries on the FPGA devcloud, and dont check the forum frequently.

Thanks

Larry

 

davidcastells
New Contributor I
527 Views

Thanks Lawrence,
I already sent them 2 maills (last saturday, and yesterday) but I have no response.

davidcastells
New Contributor I
407 Views

Let me add (for others having the same problem) that the DevCloud team finally cancelled my pending job.

A general good advice is to always include a deadline in your batch jobs to avoid any issue with the queueing system in case something strange happen. 

davidcastells
New Contributor I
528 Views

The problem is that the node s005-n005 that was running the job went down (I don't know why) and the queue system has lost the control of the job.

I cannot login to s005-n005 because it is not running.

Apparently (with admin privileges) the problem would be simply solved by running

qdel -p 18216.v-qsvr-fpga.aidevcloud

 

Reply