please I am unable to train my model to the epoch I really want for the exact results I want to occur. please help how can I optimize the jobs the better train my model for accuracy and lose values I want to reach
Thanks for reaching out to us. Could you please share the following details:
1. More details about your workload.
2. Which model are you using?
3. Are you getting any error while training? If so please share the screenshot
Thanks for the update.
As we know 24 hours is the maximum wall time ,after that it will be killed ,you can use the checkpoint .Checkpoint will save your model after every epoch so that you can resume your training.
From the screenshot attached ,we understand that you are using keras model. In order to save the model and resume training you can refer the below link.
Also you can refer the below case.
Hope this answers your question.