Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1641 Discussions

Training on multiple nodes

Singh__Jagraj
Beginner
879 Views

How to train a NN on multiple nodes using different parameters.

0 Kudos
7 Replies
ChithraJ_Intel
Moderator
879 Views
Hi Jagraj, Thanks for reaching out to us. Could you please elaborate your issue.Are you trying to submit multiple single-node jobs by giving different arguments using -F flag or are you looking for distributed multi-node training of a neural network using distributed training frameworks such as Horovod.
0 Kudos
Singh__Jagraj
Beginner
879 Views

i want to submit multiple single node jobs with different arguments. 

0 Kudos
ChithraJ_Intel
Moderator
879 Views

Hi Jagraj,


We tried to run a sample keras code which takes learning rate as argument & submitted it as multiple jobs with different values for the learning rate.We could see that it's working fine without any errors. Could you please follow the below steps which we performed:
Step 1:

source activate tensorflow

[Note : tensorflow is the preinstalled conda environment in Devcloud which comes along with oneAPI]
Step 2: Install the required packages in the environmet:

 pip install keras --user

Step3 : Submit the job script by passing different values for learning rate as follows:

qsub myjob -F "0.1"
qsub myjob -F "0.01"

[Note: Job file and python script are attached]

Hopes this helps for you.

0 Kudos
ChithraJ_Intel
Moderator
879 Views

Hi Jagraj,

Could you please let us know the solution provided answer your query. 

0 Kudos
Singh__Jagraj
Beginner
879 Views

Thanks for the solution.

0 Kudos
ChithraJ_Intel
Moderator
879 Views

Hi Jagraj,

Can we close this case if your issue got resolved?

0 Kudos
ChithraJ_Intel
Moderator
879 Views

Hi Jagraj,

We are closing this case since we were able to provide solution.Please feel free to raise a new thread if you have further issues.

0 Kudos
Reply