Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Employee
126 Views

Vanilla tensorflow vs Intel optimized tensorflow

Jump to solution

Training a model using the MNIST dataset with TF and Keras on the DevCloud, and noticing that performance with Vanilla TF is much faster when compared to Intel optimized TF. 

1 . Installed vanilla TF using 'pip install tensorflow'. This installed TF 2.3.1. Selected Python 3.7 kernel. Training takes ~2s/epoch.

2. Installed Intel optimized TF with 'pip install intel-tensorflow-avx512'. Installed TF 2.3.0. Selected Python 3.7 kernel. Training takes ~10-15s/epoch. 

Tried this multiple times. Similar results. Any idea why?

 

0 Kudos

Accepted Solutions
Highlighted
Moderator
98 Views

Hi Girin,

 

The out of box performance of intel tensorflow might not be always better than the stock tensorflow. To leverage performance improvement with intel tensorflow you should set a few environment variables and few minor code changes would be required.

Environment variables

Set KMP_BLOCKTIME=1,OMP_NUM_THREADS={number of cpu cores in your compute node}(you could get this by using lscpu command)

eg:

 

export KMP_BLOCKTIME=1

export OMP_NUM_THREADS= #physical cores

export KMP_AFFINITY=granularity=fine,verbose,compact,1,0

 

 

Code

Also set the inter_op_parallelism_threads to a value 2. This can be set adding the below line in your code.

 

 

tf.config.threading.set_intra_op_parallelism_threads(
  2 
)

 

 

Please refer to more BKMS and recommended settings when using intel tensorflow at below link

 

https://software.intel.com/content/www/us/en/develop/articles/maximize-tensorflow-performance-on-cpu...

 

 

 

Please try out the above steps and let us know if you are still having issues.

 

Thanks

Arun Jose

 

View solution in original post

0 Kudos
3 Replies
Highlighted
Moderator
99 Views

Hi Girin,

 

The out of box performance of intel tensorflow might not be always better than the stock tensorflow. To leverage performance improvement with intel tensorflow you should set a few environment variables and few minor code changes would be required.

Environment variables

Set KMP_BLOCKTIME=1,OMP_NUM_THREADS={number of cpu cores in your compute node}(you could get this by using lscpu command)

eg:

 

export KMP_BLOCKTIME=1

export OMP_NUM_THREADS= #physical cores

export KMP_AFFINITY=granularity=fine,verbose,compact,1,0

 

 

Code

Also set the inter_op_parallelism_threads to a value 2. This can be set adding the below line in your code.

 

 

tf.config.threading.set_intra_op_parallelism_threads(
  2 
)

 

 

Please refer to more BKMS and recommended settings when using intel tensorflow at below link

 

https://software.intel.com/content/www/us/en/develop/articles/maximize-tensorflow-performance-on-cpu...

 

 

 

Please try out the above steps and let us know if you are still having issues.

 

Thanks

Arun Jose

 

View solution in original post

0 Kudos
Highlighted
Employee
89 Views

great, thanks for this Arun. With these settings, i now get a training speed of ~1s/epoch with the MNIST dataset and intel optimized tensorflow on the DevCloud

0 Kudos
Highlighted
Moderator
79 Views

Hey Girin,


Glad to know the solution provided helps. We wouldn't be monitoring this thread further. Please feel free to raise a new thread in case of further issues.


Thanks

Arun


0 Kudos