Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
679 Discussions

Vanilla tensorflow vs Intel optimized tensorflow

Girin
Employee
380 Views

Training a model using the MNIST dataset with TF and Keras on the DevCloud, and noticing that performance with Vanilla TF is much faster when compared to Intel optimized TF. 

1 . Installed vanilla TF using 'pip install tensorflow'. This installed TF 2.3.1. Selected Python 3.7 kernel. Training takes ~2s/epoch.

2. Installed Intel optimized TF with 'pip install intel-tensorflow-avx512'. Installed TF 2.3.0. Selected Python 3.7 kernel. Training takes ~10-15s/epoch. 

Tried this multiple times. Similar results. Any idea why?

 

0 Kudos
1 Solution
ArunJ_Intel
Moderator
352 Views

Hi Girin,

 

The out of box performance of intel tensorflow might not be always better than the stock tensorflow. To leverage performance improvement with intel tensorflow you should set a few environment variables and few minor code changes would be required.

Environment variables

Set KMP_BLOCKTIME=1,OMP_NUM_THREADS={number of cpu cores in your compute node}(you could get this by using lscpu command)

eg:

 

export KMP_BLOCKTIME=1

export OMP_NUM_THREADS= #physical cores

export KMP_AFFINITY=granularity=fine,verbose,compact,1,0

 

 

Code

Also set the inter_op_parallelism_threads to a value 2. This can be set adding the below line in your code.

 

 

tf.config.threading.set_intra_op_parallelism_threads(
  2 
)

 

 

Please refer to more BKMS and recommended settings when using intel tensorflow at below link

 

https://software.intel.com/content/www/us/en/develop/articles/maximize-tensorflow-performance-on-cpu...

 

 

 

Please try out the above steps and let us know if you are still having issues.

 

Thanks

Arun Jose

 

View solution in original post

3 Replies
ArunJ_Intel
Moderator
353 Views

Hi Girin,

 

The out of box performance of intel tensorflow might not be always better than the stock tensorflow. To leverage performance improvement with intel tensorflow you should set a few environment variables and few minor code changes would be required.

Environment variables

Set KMP_BLOCKTIME=1,OMP_NUM_THREADS={number of cpu cores in your compute node}(you could get this by using lscpu command)

eg:

 

export KMP_BLOCKTIME=1

export OMP_NUM_THREADS= #physical cores

export KMP_AFFINITY=granularity=fine,verbose,compact,1,0

 

 

Code

Also set the inter_op_parallelism_threads to a value 2. This can be set adding the below line in your code.

 

 

tf.config.threading.set_intra_op_parallelism_threads(
  2 
)

 

 

Please refer to more BKMS and recommended settings when using intel tensorflow at below link

 

https://software.intel.com/content/www/us/en/develop/articles/maximize-tensorflow-performance-on-cpu...

 

 

 

Please try out the above steps and let us know if you are still having issues.

 

Thanks

Arun Jose

 

View solution in original post

Girin
Employee
343 Views

great, thanks for this Arun. With these settings, i now get a training speed of ~1s/epoch with the MNIST dataset and intel optimized tensorflow on the DevCloud

ArunJ_Intel
Moderator
333 Views

Hey Girin,


Glad to know the solution provided helps. We wouldn't be monitoring this thread further. Please feel free to raise a new thread in case of further issues.


Thanks

Arun


Reply