Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
998 Discussions

DevCloud node allocation limits

Gregosh
Beginner
1,243 Views

Hello.

I'm trying to learn the DevCloud infrastructure and ht qsub utility. On the offical web page  about the pbs this instruction is given:

 

[<user>@login-2 ~]$ echo cat \$PBS_NODEFILE | qsub -l nodes=4:ppn=2

 

I' m trying to tun this command form login node and from computational node, taken with the command:

qsub -I 

 

And always I have communicate:

qsub: submit error (Job exceeds queue resource limits MSG=job violates queue/server max resource limits)

 

I want to ask - is there any limit of nodes?

If this command is given as an example it should be executed in proper way?

What i how I can check to find any reason why this command is not executed properly?

Thanks in advance.
Regards,
Grzegorz

0 Kudos
1 Solution
Gopika_Intel
Moderator
902 Views

Hi,

Sorry for the delay in response. To answer your question:

> But, because the workaround is very less comfortable that using the only qsub utility and one command I would like to ask. Are You going to work about this allocation method with the qsub? And if Your response will be positive in this question. When do You expect that allocation of more than one node with the qsub natively will be working again?

 

A: Our config allowed only for 2 jobs(multi node - not allowed), each running on 1 node. (2x1). We have updated that temporarily to 2 jobs, each with up to 2 nodes. (2x2).

Hope this helps

Regards

Gopika


View solution in original post

10 Replies
Gopika_Intel
Moderator
1,204 Views

Hi,

Thank you for posting in Intel Communities and pointing this issue out. We were able to reproduce the issue and we are sorry for the inconvenience caused. As a workaround, you can request multiple nodes using the command:

echo cat \$PBS_NODEFILE | qsub -l nodes=s00X-nXXX+s00X-nXXX+s00X-nXXX:ppn=2

given the nodes, s00X-nXXX, are free

To know about the free nodes, please execute the command:

pbsnodes

 

Hope this helps

Regards

Gopika

 

ghnunes
Beginner
1,150 Views

Hi Gopika,

I'm running my master's experiments in Python, they are convolutional neural networks for image classification. But I'm always being disconnected throughout the executions, as shown in this print. Would running with more nodes solve my problem? Out of every 10 times I run the codes in 9 I get disconnected, they take an average of 3 hours to run the 30 repetitions.

To run I'm using these commands:

ssh devcloud

qsub -I

ssh s001-n0XX.aidevcloud

 

problemaConexao.jpegI tried to run these commands you mentioned to request more nodes, but I got this error: "error: unable to send message to qmaster using port 6444 on host "gustavo": can't resolve host name"

Gregosh
Beginner
1,052 Views

Hello.

Thank You for Your nice response.

The workaround which You suggested works and I will be able to use them.

But, because the workaround is very less comfortable that using the only qsub utility and one command I would like to ask.
Are You going to work about this allocation method with the qsub? And if Your response will be positive in this question: 
when do You expect that allocation of more than one node with the qsub natively will be working again?

 

Regards.
Grzegorz

Gopika_Intel
Moderator
1,124 Views

Hi,

 

Thank you for the update. We saw that you raised this question here: https://community.intel.com/t5/Intel-DevCloud/Problem-with-disconnect/m-p/1310806/emcs_t/S2h8ZW1haWx... and that thread is active. As the query is being answered in the thread link mentioned, can we discontinue monitoring this thread?

Regards

Gopika


ghnunes
Beginner
1,115 Views
Gopika_Intel
Moderator
903 Views

Hi,

Sorry for the delay in response. To answer your question:

> But, because the workaround is very less comfortable that using the only qsub utility and one command I would like to ask. Are You going to work about this allocation method with the qsub? And if Your response will be positive in this question. When do You expect that allocation of more than one node with the qsub natively will be working again?

 

A: Our config allowed only for 2 jobs(multi node - not allowed), each running on 1 node. (2x1). We have updated that temporarily to 2 jobs, each with up to 2 nodes. (2x2).

Hope this helps

Regards

Gopika


Gopika_Intel
Moderator
844 Views

Hi,

We have not heard from you. Is your query resolved? If yes, can we discontinue monitoring this thread

Regards

Gopika


Gregosh
Beginner
779 Views

Hello.

Thank You for kind response.

At this moment I accept the restrictions with submitting the jobs in the 2x2 configuration.
So thank You for Your help and we can close this topic.

Regards.

Grzegorz

ghnunes
Beginner
829 Views

Hi @Gopika_Intel , sorry, forgot to answer, everything worked out, thank you so much for your help!

Gopika_Intel
Moderator
719 Views

Hi,

 

Thank you for the confirmation and accepting our response as solution. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Regards

Gopika

 

Reply