Community
cancel
Showing results for 
Search instead for 
Did you mean: 
davideps
Novice
215 Views

How to optimize tensorflow2/keras on a machine with two XEON Gold 6230 CPUs?

I'm running on a Windows 10 Enterprise 64bit machine with two XEON Gold 6230 CPUs (20 physical cores each) and Anaconda Python 3.8.8 64bit. I installed the packages with

conda install tensorflow-mkl keras -c anaconda

I'm using mnist_convnet.py  to experiment with configurations with the goal of maximizing usage of both CPUs.

By default, the code uses all cores on a single CPU. I then added "config" to the imports and these lines to the code:

config.threading.set_inter_op_parallelism_threads(0)
config.threading.set_intra_op_parallelism_threads(0)
config.set_soft_device_placement(True)

This had no impact. Changing "set_inter_op_parallelism_threads" to 2 (the value I expected to trigger usage of both CPUs) had no effect either. All other settings I tried greatly reduced performance. I have several interrelated questions:

1. How can I get tensorflow/keras to use both CPUs?
2. Did I chose a poor example for multiCPU execution? If so, what is a better example?
3. Despite specifying `tensorflow-mkl` on install, the sanity check fails (result is False). Does that explain this problem? If so, how can I fix it?

Labels (1)
Tags (3)
0 Kudos
8 Replies
JoseH_Intel
Moderator
189 Views

Hello davideps,


Thank you for joining the Intel community


Please allow us some time to research on your question. We will get back to you as soon as we have updates.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


davideps
Novice
143 Views

Hi Jose, thank you for your response. Can you tell me if anyone else has reported this issue on machines with two chips (any model)  and whether you can recreate the problem based on the code I supplied?

AthiraM_Intel
Moderator
132 Views

Hi,


To maximize Tensorflow performance on CPU, you could use some parameter settings such as intra_/inter_op_parallelism_threads,Data Layout, KMP_AFFINITY, KMP_BLOCKTIME, OMP_NUM_THREADS etc. The recommended settings are available in the below link:

https://software.intel.com/content/www/us/en/develop/articles/maximize-tensorflow-performance-on-cpu...

Please follow this article for openmp settings.


Regarding the installation, you could use installation option from the below link:

https://software.intel.com/content/www/us/en/develop/articles/intel-optimization-for-tensorflow-inst...


For windows , you can use any of the below commands or you can build tensorflow from source


conda install tensorflow-mkl

conda install tensorflow-mkl -c anaconda


The steps to build tensorflow from source is available in the above documentation.


We are checking on your other queries internally, will get back to you soon with an update.


Thanks.


AthiraM_Intel
Moderator
113 Views

Hi,


Regarding your second query, "Did I chose a poor example for multi CPU execution? If so, what is a better example?":


You could use the same sample (mnist_convnet.py), it will work fine with multi-threading. 


Regarding the sanity check, we are checking from our end, will let you know the updates soon.


Could you please let us know the version of tensorflow you are using?


Thanks.


davideps
Novice
101 Views

Hi Athira,

"conda list" shows:

 

tensorflow 2.3.0 mkl_py38h37f7ee5_0
tensorflow-base 2.3.0 eigen_py38h75a453f_0
tensorflow-estimator 2.3.0 pyheb71bc4_0 anaconda
tensorflow-mkl 2.3.0 h93d2e19_0

AthiraM_Intel
Moderator
83 Views

Hi,


Regarding the sanity check, we are able to reproduce the issue. We are checking internally on the issue, will let you know the updates soon.


Thanks.


davideps
Novice
77 Views

Thanks Athira. Good to know that it wasn't me failing the sanity check

Would the problem cause config settings like this (below) to misbehave?

 

 

 

config.threading.set_inter_op_parallelism_threads(2)
config.threading.set_intra_op_parallelism_threads(0)

 

 

 

One of my initial questions was whether I need distributed workers to get both XEON CPUs on a single machine to share the load or whether the XEON platform should do this without distributed workers. Of course, I'm hoping distributed workers aren't necessary on a single machine since I believe that approach is designed for multiple machines in a network and will be slower than two CPUs that already share memory.

davideps
Novice
29 Views

Hi Athira. Is there any update on this issue or an estimate of when it might be resolved?

Reply