Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Singh__Jagraj
Beginner
161 Views

Training on multiple nodes

How to train a NN on multiple nodes using different parameters.

Tags (1)
0 Kudos
7 Replies
ChithraJ_Intel
Moderator
161 Views

Hi Jagraj, Thanks for reaching out to us. Could you please elaborate your issue.Are you trying to submit multiple single-node jobs by giving different arguments using -F flag or are you looking for distributed multi-node training of a neural network using distributed training frameworks such as Horovod.
Singh__Jagraj
Beginner
161 Views

i want to submit multiple single node jobs with different arguments. 

ChithraJ_Intel
Moderator
161 Views

Hi Jagraj,


We tried to run a sample keras code which takes learning rate as argument & submitted it as multiple jobs with different values for the learning rate.We could see that it's working fine without any errors. Could you please follow the below steps which we performed:
Step 1:

source activate tensorflow

[Note : tensorflow is the preinstalled conda environment in Devcloud which comes along with oneAPI]
Step 2: Install the required packages in the environmet:

 pip install keras --user

Step3 : Submit the job script by passing different values for learning rate as follows:

qsub myjob -F "0.1"
qsub myjob -F "0.01"

[Note: Job file and python script are attached]

Hopes this helps for you.

ChithraJ_Intel
Moderator
161 Views

Hi Jagraj,

Could you please let us know the solution provided answer your query. 

Singh__Jagraj
Beginner
161 Views

Thanks for the solution.

ChithraJ_Intel
Moderator
161 Views

Hi Jagraj,

Can we close this case if your issue got resolved?

ChithraJ_Intel
Moderator
161 Views

Hi Jagraj,

We are closing this case since we were able to provide solution.Please feel free to raise a new thread if you have further issues.

Reply