Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1687 Discussions

number of running jobs

W_E_Intel
Employee
4,443 Views

Hello,

 

two days ago, I was able to run four jobs on-time on different nodes, but now only one job is running while other are pending on the queue while the nodes are free and power-on. please help on this issue.

 

Thanks,

WE  

1 Solution
RahulU_Intel
Moderator
3,793 Views

HI,


Thank you for your patience. We will contact you privately. We will close this inquiry now. If you need further assistance, please post a new question.


Thanks and Regards,

Rahul


View solution in original post

0 Kudos
23 Replies
ThiagoFilipe
Novice
3,773 Views

I'm having the same issue. I actually put 4 jobs to run two days ago, and in the following day they were simply gone. When I tried to put the 4 jobs to run again yesterday, only 1 actually ran and the other 3 got stuck in the pending queue.

W_E_Intel
Employee
3,718 Views
0 Kudos
ThiagoFilipe
Novice
3,671 Views

Hi, no change yet.

 

My jobs usually go like this:
- 4 jobs training some neural networks (NNs);

- after all 4 jobs are completed, 4 new jobs evaluating the NNs trained;

 

I've waited for each job to finish sequentially and tried to run the 4 jobs for evaluation to check if the issue was solved and they'd run in parallel ... Unfortunately this was not the case, and only 1 job is running while the other 3 are in the waiting queue.

RahulU_Intel
Moderator
3,775 Views

Hi,

 

Thanks for posting in Intel communities. Could you please share with us the command you used.

 

Thanks

Rahul

 

W_E_Intel
Employee
3,768 Views

qsub -l nodes=1:[node properties]:ppn=2 -d . -l walltime=24:00:00 shscript.sh -F "5"

 

 

replace the [node properties] by any of the available nodes properties 

0 Kudos
W_E_Intel
Employee
3,719 Views

Hello @RahulU_Intel 

 

This is an urgent request, please help me to resolve the issue as soon as possible ? 

 

thanks

 

WE

0 Kudos
GRN2
Employee
3,713 Views

I see, this restriction is for all types of PBS jobs including interactive. Еhe next task starts after the completion of the previous one.

RahulU_Intel
Moderator
3,696 Views

Hi,


We are sorry for the delay. We are checking with admin team on this issue. We'll get back to you.


Thanks



W_E_Intel
Employee
3,648 Views

Hello @RahulU_Intel 

 

Can you please update me by the current status? what is the time frame to resolve this issue?

 

thanks,

 

WE

 

0 Kudos
ThiagoFilipe
Novice
3,602 Views

Any updates?

0 Kudos
W_E_Intel
Employee
3,577 Views

No, nothing change even after the recent maintenance. I hope admins can get back to us by a clear feedback on the current state, at least to clarify if this is a permanent or temporary situation. 

@RahulU_Intel  

Thanks,

 

WE

0 Kudos
RahulU_Intel
Moderator
3,537 Views

Hi,


Thank you for your patience. We are working with DevCloud admins for the clarification regarding your query. Really sorry for the inconvenience caused. 


Thanks and Regards


0 Kudos
RahulU_Intel
Moderator
3,450 Views

Hi,


Thank you for your patience. We reproduced your case at our side. The nodes marked as jupyter (in node property) can host up to two Jupyter sessions simultaneously. A jupyter node will show up as free when it is hosting only one Jupyter session. Each compute node has 2 available slots. When a Jupyter session is started, a single slot (ppn=1) is allocated to host the session on a jupyter node. However, all other jobs (not jupyter) require two open slots (ppn=2). This is due to the way PBS is configured on the DevCloud. In your specific case, the nodes that are waiting in the queue are half occupied. So instead of giving particular node name, you can try give general node properties like gpu, cpu etc.

Hope this helps.


Regards,

Rahul


0 Kudos
ThiagoFilipe
Novice
3,432 Views

I don't understand, could you please provide an example?

 

@W_E_Intel does this solution works for you?

 

I run my jobs like this:

qsub -l nodes=1:gpu:ppn=2 -d . shscript.sh -l walltime=23:59:55 -F "${ARG_1} ${ARG_2}"

 

But I don't get what I should change here for it to work

0 Kudos
esal_04
New Contributor I
3,407 Views

Hello,

I've been having the same issue. I noticed it a few weeks ago.

I also didn't understand from @RahulU_Intel's answer what I need to change. I've always used qsub's default node configurations:

qsub shscript.sh -F "${ARG_1} ${ARG_2}"

When I typed "qstat", I used to see 4 jobs running simultaneously, but now it's only 1 at a time. Why?

0 Kudos
W_E_Intel
Employee
3,367 Views

hello @RahulU_Intel ,

 

I have used the command qsub command with ppn=2 and general node properties all the time, I will check it again and update you soon.

 

thank you for your great help and support 

 

WE

0 Kudos
W_E_Intel
Employee
3,342 Views

Hello @RahulU_Intel 

 

It dose not work. I submitted some jobs using qsub command so you can see the issue on my account ?

 

Thanks,

 

WE 

0 Kudos
GRN2
Employee
3,339 Views

Also, we can create several accounts to run several jobs.

0 Kudos
W_E_Intel
Employee
3,333 Views

Can you please explain ?

0 Kudos
GRN2
Employee
3,249 Views

You can create several accounts (user names) for DevCloud, and change them in ./ssh/config file to connect to Devcloud using different logins.

So, you will be able to submit one job per one user name. 

Reply