Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1641 Discussions

Vanilla tensorflow vs Intel optimized tensorflow

Girin
Employee
1,111 Views

Training a model using the MNIST dataset with TF and Keras on the DevCloud, and noticing that performance with Vanilla TF is much faster when compared to Intel optimized TF. 

1 . Installed vanilla TF using 'pip install tensorflow'. This installed TF 2.3.1. Selected Python 3.7 kernel. Training takes ~2s/epoch.

2. Installed Intel optimized TF with 'pip install intel-tensorflow-avx512'. Installed TF 2.3.0. Selected Python 3.7 kernel. Training takes ~10-15s/epoch. 

Tried this multiple times. Similar results. Any idea why?

 

0 Kudos
1 Solution
ArunJ_Intel
Moderator
1,087 Views

Hi Girin,

 

The out of box performance of intel tensorflow might not be always better than the stock tensorflow. To leverage performance improvement with intel tensorflow you should set a few environment variables and few minor code changes would be required.

Environment variables

Set KMP_BLOCKTIME=1,OMP_NUM_THREADS={number of cpu cores in your compute node}(you could get this by using lscpu command)

eg:

 

export KMP_BLOCKTIME=1

export OMP_NUM_THREADS= #physical cores

export KMP_AFFINITY=granularity=fine,verbose,compact,1,0

 

 

Code

Also set the inter_op_parallelism_threads to a value 2. This can be set adding the below line in your code.

 

 

tf.config.threading.set_intra_op_parallelism_threads(
  2 
)

 

 

Please refer to more BKMS and recommended settings when using intel tensorflow at below link

 

https://software.intel.com/content/www/us/en/develop/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference.html

 

 

 

Please try out the above steps and let us know if you are still having issues.

 

Thanks

Arun Jose

 

View solution in original post

0 Kudos
3 Replies
ArunJ_Intel
Moderator
1,088 Views

Hi Girin,

 

The out of box performance of intel tensorflow might not be always better than the stock tensorflow. To leverage performance improvement with intel tensorflow you should set a few environment variables and few minor code changes would be required.

Environment variables

Set KMP_BLOCKTIME=1,OMP_NUM_THREADS={number of cpu cores in your compute node}(you could get this by using lscpu command)

eg:

 

export KMP_BLOCKTIME=1

export OMP_NUM_THREADS= #physical cores

export KMP_AFFINITY=granularity=fine,verbose,compact,1,0

 

 

Code

Also set the inter_op_parallelism_threads to a value 2. This can be set adding the below line in your code.

 

 

tf.config.threading.set_intra_op_parallelism_threads(
  2 
)

 

 

Please refer to more BKMS and recommended settings when using intel tensorflow at below link

 

https://software.intel.com/content/www/us/en/develop/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference.html

 

 

 

Please try out the above steps and let us know if you are still having issues.

 

Thanks

Arun Jose

 

0 Kudos
Girin
Employee
1,078 Views

great, thanks for this Arun. With these settings, i now get a training speed of ~1s/epoch with the MNIST dataset and intel optimized tensorflow on the DevCloud

0 Kudos
ArunJ_Intel
Moderator
1,068 Views

Hey Girin,


Glad to know the solution provided helps. We wouldn't be monitoring this thread further. Please feel free to raise a new thread in case of further issues.


Thanks

Arun


0 Kudos
Reply