Community
cancel
Showing results for 
Search instead for 
Did you mean: 
103 Views

unable to train my model for an epoch of more 26 count

please I am unable to train my model to the epoch I really want for the exact results I want to occur. please help how can I optimize the jobs the better train my model for accuracy and lose values I want to reach

Tags (1)
0 Kudos
7 Replies
AthiraM_Intel
Moderator
103 Views

Hi,

Thanks for reaching out to us. Could you please share the following details:

1. More details about your workload. 

2. Which model are you using?

3. Are you getting any error while training?  If so please share the screenshot 

 

 

103 Views

The following shows out how the model hangout why processing in the job

I actually submitted a job with a wall time of 24:00:00

 

I wish to increase or may have help in other to optimize my model for it to fit within that time constraint

JananiC_Intel
Moderator
103 Views

Hi,

Thanks for the update.

As we know 24 hours is the maximum wall time ,after that it will be killed ,you can use the checkpoint .Checkpoint will save your model after every epoch so that you can resume your training.

103 Views

please how do we continue training a model from a checkpoint

JananiC_Intel
Moderator
103 Views

Hi,

From the screenshot attached ,we understand that you are using keras model. In order to save the model and resume training you can refer the below link.

https://keras.rstudio.com/articles/tutorial_save_and_restore.html

Also you can refer the below case.

https://software.intel.com/en-us/forums/intel-devcloud/topic/849464#

Hope this answers your question.

JananiC_Intel
Moderator
103 Views

Hi,

Could you please give us an update whether the issue got resolved?

JananiC_Intel
Moderator
103 Views

Hi,

We are closing this case assuming your issue is resolved. Please feel free to raise a new thread in case of further issues.

Reply