Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
890 Discussions

number of running jobs

W_E_Intel
Employee
1,921 Views

Hello,

 

two days ago, I was able to run four jobs on-time on different nodes, but now only one job is running while other are pending on the queue while the nodes are free and power-on. please help on this issue.

 

Thanks,

WE  

1 Solution
RahulU_Intel
Moderator
1,271 Views

HI,


Thank you for your patience. We will contact you privately. We will close this inquiry now. If you need further assistance, please post a new question.


Thanks and Regards,

Rahul


View solution in original post

23 Replies
ThiagoFilipe
Novice
1,686 Views

I'm having the same issue. I actually put 4 jobs to run two days ago, and in the following day they were simply gone. When I tried to put the 4 jobs to run again yesterday, only 1 actually ran and the other 3 got stuck in the pending queue.

W_E_Intel
Employee
1,631 Views
ThiagoFilipe
Novice
1,584 Views

Hi, no change yet.

 

My jobs usually go like this:
- 4 jobs training some neural networks (NNs);

- after all 4 jobs are completed, 4 new jobs evaluating the NNs trained;

 

I've waited for each job to finish sequentially and tried to run the 4 jobs for evaluation to check if the issue was solved and they'd run in parallel ... Unfortunately this was not the case, and only 1 job is running while the other 3 are in the waiting queue.

RahulU_Intel
Moderator
1,684 Views

Hi,

 

Thanks for posting in Intel communities. Could you please share with us the command you used.

 

Thanks

Rahul

 

W_E_Intel
Employee
1,678 Views

qsub -l nodes=1:[node properties]:ppn=2 -d . -l walltime=24:00:00 shscript.sh -F "5"

 

 

replace the [node properties] by any of the available nodes properties 

W_E_Intel
Employee
1,628 Views

Hello @RahulU_Intel 

 

This is an urgent request, please help me to resolve the issue as soon as possible ? 

 

thanks

 

WE

GRN2
Employee
1,622 Views

I see, this restriction is for all types of PBS jobs including interactive. Еhe next task starts after the completion of the previous one.

RahulU_Intel
Moderator
1,605 Views

Hi,


We are sorry for the delay. We are checking with admin team on this issue. We'll get back to you.


Thanks



W_E_Intel
Employee
1,557 Views

Hello @RahulU_Intel 

 

Can you please update me by the current status? what is the time frame to resolve this issue?

 

thanks,

 

WE

 

ThiagoFilipe
Novice
1,511 Views

Any updates?

W_E_Intel
Employee
1,486 Views

No, nothing change even after the recent maintenance. I hope admins can get back to us by a clear feedback on the current state, at least to clarify if this is a permanent or temporary situation. 

@RahulU_Intel  

Thanks,

 

WE

RahulU_Intel
Moderator
1,446 Views

Hi,


Thank you for your patience. We are working with DevCloud admins for the clarification regarding your query. Really sorry for the inconvenience caused. 


Thanks and Regards


RahulU_Intel
Moderator
1,359 Views

Hi,


Thank you for your patience. We reproduced your case at our side. The nodes marked as jupyter (in node property) can host up to two Jupyter sessions simultaneously. A jupyter node will show up as free when it is hosting only one Jupyter session. Each compute node has 2 available slots. When a Jupyter session is started, a single slot (ppn=1) is allocated to host the session on a jupyter node. However, all other jobs (not jupyter) require two open slots (ppn=2). This is due to the way PBS is configured on the DevCloud. In your specific case, the nodes that are waiting in the queue are half occupied. So instead of giving particular node name, you can try give general node properties like gpu, cpu etc.

Hope this helps.


Regards,

Rahul


ThiagoFilipe
Novice
1,341 Views

I don't understand, could you please provide an example?

 

@W_E_Intel does this solution works for you?

 

I run my jobs like this:

qsub -l nodes=1:gpu:ppn=2 -d . shscript.sh -l walltime=23:59:55 -F "${ARG_1} ${ARG_2}"

 

But I don't get what I should change here for it to work

Alcorta__Erika_Susan
New Contributor I
1,316 Views

Hello,

I've been having the same issue. I noticed it a few weeks ago.

I also didn't understand from @RahulU_Intel's answer what I need to change. I've always used qsub's default node configurations:

qsub shscript.sh -F "${ARG_1} ${ARG_2}"

When I typed "qstat", I used to see 4 jobs running simultaneously, but now it's only 1 at a time. Why?

W_E_Intel
Employee
1,276 Views

hello @RahulU_Intel ,

 

I have used the command qsub command with ppn=2 and general node properties all the time, I will check it again and update you soon.

 

thank you for your great help and support 

 

WE

W_E_Intel
Employee
1,251 Views

Hello @RahulU_Intel 

 

It dose not work. I submitted some jobs using qsub command so you can see the issue on my account ?

 

Thanks,

 

WE 

GRN2
Employee
1,248 Views

Also, we can create several accounts to run several jobs.

W_E_Intel
Employee
1,242 Views

Can you please explain ?

GRN2
Employee
1,158 Views

You can create several accounts (user names) for DevCloud, and change them in ./ssh/config file to connect to Devcloud using different logins.

So, you will be able to submit one job per one user name. 

Reply