- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
how will, i get results from neural network model training in batch processing mode and analyze the training . I have created neural network mode and qsub to submit for execution in devcloud account, but don't know how to store training results results for analyzing what model has learned. I need help in this regards
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
To submit jobs to Intel DevCloud for oneAPI using Batch mode, you need to create a job script that contains the following lines which are expected to be run on the node being requested using qsub:
source /opt/intel/oneapi/setvars.sh
source activate <conda_env_name>
python <program_file>.py
The first line is used to source the Intel oneAPI system variables. The second line is used to activate the conda (default or the custom created) environment in which the code needs to be run and third line is used to run the python code.
Reference for job script files:
Reference for the official neural network training samples using TensorFlow and PyTorch can be obtained from the below links:
The jobs can be submitted using the qsub command as shown below:
$ qsub -l nodes=1:gpu:ppn=2 -d . run.sh
Once the above command is run, a Job ID will be created. The status of the jobs submitted for execution can be monitored using qstat command.
After the execution is complete, two files will be generated, run.sh.o<job_id> (output file) and run.sh.e<job_id> (error file). All the standard outputs generated by the script/program is saved to the output file and all the standard errors generated by the script/program is saved to the error file. These generated files can be used to analyze the status of the neural network training or inference submitted by batch mode.
Alternatively, if you wish to have an interactive JupyterLab environment which facilitates the above process, you can also use the JupyterLab feature provided by Intel DevCloud for oneAPI to perform model training, inference and thus analyze the results of the training and inference visually. JupyterLab environment in Intel DevCloud for oneAPI can be accessed using the link:
https://jupyter.oneapi.devcloud.intel.com/
Reference for JupyterLab sample: https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/Intel_Extension_For_TensorFlow_GettingStarted
If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Kindly get back to us with more information if we misunderstood your query or if you are still facing issues doing batch submission.
Thanks.
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please give us an update? Is your issue resolved?
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for providing help. Yes I got it
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Regards,
Sreedevi
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page