Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1625 Discussions

DevCloud node allocation limits

Gregosh
Beginner
2,267 Views

Hello.

I'm trying to learn the DevCloud infrastructure and ht qsub utility. On the offical web page  about the pbs this instruction is given:

 

[<user>@login-2 ~]$ echo cat \$PBS_NODEFILE | qsub -l nodes=4:ppn=2

 

I' m trying to tun this command form login node and from computational node, taken with the command:

qsub -I 

 

And always I have communicate:

qsub: submit error (Job exceeds queue resource limits MSG=job violates queue/server max resource limits)

 

I want to ask - is there any limit of nodes?

If this command is given as an example it should be executed in proper way?

What i how I can check to find any reason why this command is not executed properly?

Thanks in advance.
Regards,
Grzegorz

0 Kudos
1 Solution
Gopika_Intel
Moderator
1,926 Views

Hi,

Sorry for the delay in response. To answer your question:

> But, because the workaround is very less comfortable that using the only qsub utility and one command I would like to ask. Are You going to work about this allocation method with the qsub? And if Your response will be positive in this question. When do You expect that allocation of more than one node with the qsub natively will be working again?

 

A: Our config allowed only for 2 jobs(multi node - not allowed), each running on 1 node. (2x1). We have updated that temporarily to 2 jobs, each with up to 2 nodes. (2x2).

Hope this helps

Regards

Gopika


View solution in original post

0 Kudos
10 Replies
Gopika_Intel
Moderator
2,228 Views

Hi,

Thank you for posting in Intel Communities and pointing this issue out. We were able to reproduce the issue and we are sorry for the inconvenience caused. As a workaround, you can request multiple nodes using the command:

echo cat \$PBS_NODEFILE | qsub -l nodes=s00X-nXXX+s00X-nXXX+s00X-nXXX:ppn=2

given the nodes, s00X-nXXX, are free

To know about the free nodes, please execute the command:

pbsnodes

 

Hope this helps

Regards

Gopika

 

ghnunes
Beginner
2,174 Views

Hi Gopika,

I'm running my master's experiments in Python, they are convolutional neural networks for image classification. But I'm always being disconnected throughout the executions, as shown in this print. Would running with more nodes solve my problem? Out of every 10 times I run the codes in 9 I get disconnected, they take an average of 3 hours to run the 30 repetitions.

To run I'm using these commands:

ssh devcloud

qsub -I

ssh s001-n0XX.aidevcloud

 

problemaConexao.jpegI tried to run these commands you mentioned to request more nodes, but I got this error: "error: unable to send message to qmaster using port 6444 on host "gustavo": can't resolve host name"

0 Kudos
Gregosh
Beginner
2,076 Views

Hello.

Thank You for Your nice response.

The workaround which You suggested works and I will be able to use them.

But, because the workaround is very less comfortable that using the only qsub utility and one command I would like to ask.
Are You going to work about this allocation method with the qsub? And if Your response will be positive in this question: 
when do You expect that allocation of more than one node with the qsub natively will be working again?

 

Regards.
Grzegorz

0 Kudos
Gopika_Intel
Moderator
2,148 Views

Hi,

 

Thank you for the update. We saw that you raised this question here: https://community.intel.com/t5/Intel-DevCloud/Problem-with-disconnect/m-p/1310806/emcs_t/S2h8ZW1haWx8Ym9hcmRfc3Vic2NyaXB0aW9ufEtTWUZDQ0FJMzEzOEs2fDEzMTA4MDZ8U1VCU0NSSVBUSU9OU3xoSw#M2831 and that thread is active. As the query is being answered in the thread link mentioned, can we discontinue monitoring this thread?

Regards

Gopika


0 Kudos
ghnunes
Beginner
2,139 Views
0 Kudos
Gopika_Intel
Moderator
1,927 Views

Hi,

Sorry for the delay in response. To answer your question:

> But, because the workaround is very less comfortable that using the only qsub utility and one command I would like to ask. Are You going to work about this allocation method with the qsub? And if Your response will be positive in this question. When do You expect that allocation of more than one node with the qsub natively will be working again?

 

A: Our config allowed only for 2 jobs(multi node - not allowed), each running on 1 node. (2x1). We have updated that temporarily to 2 jobs, each with up to 2 nodes. (2x2).

Hope this helps

Regards

Gopika


0 Kudos
Gopika_Intel
Moderator
1,868 Views

Hi,

We have not heard from you. Is your query resolved? If yes, can we discontinue monitoring this thread

Regards

Gopika


0 Kudos
Gregosh
Beginner
1,803 Views

Hello.

Thank You for kind response.

At this moment I accept the restrictions with submitting the jobs in the 2x2 configuration.
So thank You for Your help and we can close this topic.

Regards.

Grzegorz

0 Kudos
ghnunes
Beginner
1,853 Views

Hi @Gopika_Intel , sorry, forgot to answer, everything worked out, thank you so much for your help!

0 Kudos
Gopika_Intel
Moderator
1,743 Views

Hi,

 

Thank you for the confirmation and accepting our response as solution. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Regards

Gopika

 

0 Kudos
Reply