Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1626 Discussions

unable to train my model for an epoch of more 26 count

Tchouanga__Franck
686 Views

please I am unable to train my model to the epoch I really want for the exact results I want to occur. please help how can I optimize the jobs the better train my model for accuracy and lose values I want to reach

0 Kudos
7 Replies
AthiraM_Intel
Moderator
686 Views

Hi,

Thanks for reaching out to us. Could you please share the following details:

1. More details about your workload. 

2. Which model are you using?

3. Are you getting any error while training?  If so please share the screenshot 

 

 

0 Kudos
Tchouanga__Franck
686 Views

The following shows out how the model hangout why processing in the job

I actually submitted a job with a wall time of 24:00:00

 

I wish to increase or may have help in other to optimize my model for it to fit within that time constraint

0 Kudos
JananiC_Intel
Moderator
686 Views

Hi,

Thanks for the update.

As we know 24 hours is the maximum wall time ,after that it will be killed ,you can use the checkpoint .Checkpoint will save your model after every epoch so that you can resume your training.

0 Kudos
Tchouanga__Franck
686 Views

please how do we continue training a model from a checkpoint

0 Kudos
JananiC_Intel
Moderator
686 Views

Hi,

From the screenshot attached ,we understand that you are using keras model. In order to save the model and resume training you can refer the below link.

https://keras.rstudio.com/articles/tutorial_save_and_restore.html

Also you can refer the below case.

https://software.intel.com/en-us/forums/intel-devcloud/topic/849464#

Hope this answers your question.

0 Kudos
JananiC_Intel
Moderator
686 Views

Hi,

Could you please give us an update whether the issue got resolved?

0 Kudos
JananiC_Intel
Moderator
686 Views

Hi,

We are closing this case assuming your issue is resolved. Please feel free to raise a new thread in case of further issues.

0 Kudos
Reply