Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Frozen Job in Devcloud

davidcastells
New Contributor I
2,238 Views

I have a frozen job in DevCloud. Time quota was 6 hours, but it's been running for more than 62 hours.

I try to kill it with

qdel <job id>

but I get 

qdel: Server could not connect to MOM <job id>

 

Any idea on what to do ?

0 Kudos
1 Solution
davidcastells
New Contributor I
2,086 Views

Let me add (for others having the same problem) that the DevCloud team finally cancelled my pending job.

A general good advice is to always include a deadline in your batch jobs to avoid any issue with the queueing system in case something strange happen. 

View solution in original post

0 Kudos
6 Replies
Hazlina_R_Intel
Moderator
2,223 Views

Hi,

I have forwarded your issue to the owner of this Dev Cloud platform and awaiting to hear back. I would request for them to answer to your post directly. Please give us a couple of days on this.


-Hazlina


0 Kudos
Lawrence_L_Intel
Employee
2,216 Views

Do you know which server you launched the job from? If so, you can log back into the same server, you can try ps -auxw and kill -9 the job ID. Sometimes that kills the job. Make sure you use the walltime construct in batch mode so you don't time out i the future.

Thanks,

Larry

0 Kudos
Lawrence_L_Intel
Employee
2,213 Views

Let me add if you post here and dont see a response, try fpgauniversity@intel.com . We have a fairly small team moderating technical inquiries on the FPGA devcloud, and dont check the forum frequently.

Thanks

Larry

 

0 Kudos
davidcastells
New Contributor I
2,207 Views

Thanks Lawrence,
I already sent them 2 maills (last saturday, and yesterday) but I have no response.

0 Kudos
davidcastells
New Contributor I
2,087 Views

Let me add (for others having the same problem) that the DevCloud team finally cancelled my pending job.

A general good advice is to always include a deadline in your batch jobs to avoid any issue with the queueing system in case something strange happen. 

0 Kudos
davidcastells
New Contributor I
2,208 Views

The problem is that the node s005-n005 that was running the job went down (I don't know why) and the queue system has lost the control of the job.

I cannot login to s005-n005 because it is not running.

Apparently (with admin privileges) the problem would be simply solved by running

qdel -p 18216.v-qsvr-fpga.aidevcloud

 

0 Kudos
Reply