Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
공지
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Frozen Job in Devcloud

davidcastells
새로운 기여자 I
2,244 조회수

I have a frozen job in DevCloud. Time quota was 6 hours, but it's been running for more than 62 hours.

I try to kill it with

qdel <job id>

but I get 

qdel: Server could not connect to MOM <job id>

 

Any idea on what to do ?

0 포인트
1 솔루션
davidcastells
새로운 기여자 I
2,092 조회수

Let me add (for others having the same problem) that the DevCloud team finally cancelled my pending job.

A general good advice is to always include a deadline in your batch jobs to avoid any issue with the queueing system in case something strange happen. 

원본 게시물의 솔루션 보기

0 포인트
6 응답
Hazlina_R_Intel
중재자
2,229 조회수

Hi,

I have forwarded your issue to the owner of this Dev Cloud platform and awaiting to hear back. I would request for them to answer to your post directly. Please give us a couple of days on this.


-Hazlina


0 포인트
Lawrence_L_Intel
2,222 조회수

Do you know which server you launched the job from? If so, you can log back into the same server, you can try ps -auxw and kill -9 the job ID. Sometimes that kills the job. Make sure you use the walltime construct in batch mode so you don't time out i the future.

Thanks,

Larry

0 포인트
Lawrence_L_Intel
2,219 조회수

Let me add if you post here and dont see a response, try fpgauniversity@intel.com . We have a fairly small team moderating technical inquiries on the FPGA devcloud, and dont check the forum frequently.

Thanks

Larry

 

0 포인트
davidcastells
새로운 기여자 I
2,213 조회수

Thanks Lawrence,
I already sent them 2 maills (last saturday, and yesterday) but I have no response.

0 포인트
davidcastells
새로운 기여자 I
2,093 조회수

Let me add (for others having the same problem) that the DevCloud team finally cancelled my pending job.

A general good advice is to always include a deadline in your batch jobs to avoid any issue with the queueing system in case something strange happen. 

0 포인트
davidcastells
새로운 기여자 I
2,214 조회수

The problem is that the node s005-n005 that was running the job went down (I don't know why) and the queue system has lost the control of the job.

I cannot login to s005-n005 because it is not running.

Apparently (with admin privileges) the problem would be simply solved by running

qdel -p 18216.v-qsvr-fpga.aidevcloud

 

0 포인트
응답