- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Training a model using the MNIST dataset with TF and Keras on the DevCloud, and noticing that performance with Vanilla TF is much faster when compared to Intel optimized TF.
1 . Installed vanilla TF using 'pip install tensorflow'. This installed TF 2.3.1. Selected Python 3.7 kernel. Training takes ~2s/epoch.
2. Installed Intel optimized TF with 'pip install intel-tensorflow-avx512'. Installed TF 2.3.0. Selected Python 3.7 kernel. Training takes ~10-15s/epoch.
Tried this multiple times. Similar results. Any idea why?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Girin,
The out of box performance of intel tensorflow might not be always better than the stock tensorflow. To leverage performance improvement with intel tensorflow you should set a few environment variables and few minor code changes would be required.
Environment variables
Set KMP_BLOCKTIME=1,OMP_NUM_THREADS={number of cpu cores in your compute node}(you could get this by using lscpu command)
eg:
export KMP_BLOCKTIME=1
export OMP_NUM_THREADS= #physical cores
export KMP_AFFINITY=granularity=fine,verbose,compact,1,0
Code
Also set the inter_op_parallelism_threads to a value 2. This can be set adding the below line in your code.
tf.config.threading.set_intra_op_parallelism_threads(
2
)
Please refer to more BKMS and recommended settings when using intel tensorflow at below link
Please try out the above steps and let us know if you are still having issues.
Thanks
Arun Jose
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Girin,
The out of box performance of intel tensorflow might not be always better than the stock tensorflow. To leverage performance improvement with intel tensorflow you should set a few environment variables and few minor code changes would be required.
Environment variables
Set KMP_BLOCKTIME=1,OMP_NUM_THREADS={number of cpu cores in your compute node}(you could get this by using lscpu command)
eg:
export KMP_BLOCKTIME=1
export OMP_NUM_THREADS= #physical cores
export KMP_AFFINITY=granularity=fine,verbose,compact,1,0
Code
Also set the inter_op_parallelism_threads to a value 2. This can be set adding the below line in your code.
tf.config.threading.set_intra_op_parallelism_threads(
2
)
Please refer to more BKMS and recommended settings when using intel tensorflow at below link
Please try out the above steps and let us know if you are still having issues.
Thanks
Arun Jose
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
great, thanks for this Arun. With these settings, i now get a training speed of ~1s/epoch with the MNIST dataset and intel optimized tensorflow on the DevCloud
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Girin,
Glad to know the solution provided helps. We wouldn't be monitoring this thread further. Please feel free to raise a new thread in case of further issues.
Thanks
Arun

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page