I'm running on a Windows 10 Enterprise 64bit machine with two XEON Gold 6230 CPUs (20 physical cores each) and Anaconda Python 3.8.8 64bit. I installed the packages with
conda install tensorflow-mkl keras -c anaconda
I'm using mnist_convnet.py to experiment with configurations with the goal of maximizing usage of both CPUs.
By default, the code uses all cores on a single CPU. I then added "config" to the imports and these lines to the code:
config.threading.set_inter_op_parallelism_threads(0) config.threading.set_intra_op_parallelism_threads(0) config.set_soft_device_placement(True)
This had no impact. Changing "set_inter_op_parallelism_threads" to 2 (the value I expected to trigger usage of both CPUs) had no effect either. All other settings I tried greatly reduced performance. I have several interrelated questions:
1. How can I get tensorflow/keras to use both CPUs?
2. Did I chose a poor example for multiCPU execution? If so, what is a better example?
3. Despite specifying `tensorflow-mkl` on install, the sanity check fails (result is False). Does that explain this problem? If so, how can I fix it?
Thank you for joining the Intel community
Please allow us some time to research on your question. We will get back to you as soon as we have updates.
Intel Customer Support Technician
For firmware updates and troubleshooting tips, visit:
Hi Jose, thank you for your response. Can you tell me if anyone else has reported this issue on machines with two chips (any model) and whether you can recreate the problem based on the code I supplied?
To maximize Tensorflow performance on CPU, you could use some parameter settings such as intra_/inter_op_parallelism_threads,Data Layout, KMP_AFFINITY, KMP_BLOCKTIME, OMP_NUM_THREADS etc. The recommended settings are available in the below link:
Please follow this article for openmp settings.
Regarding the installation, you could use installation option from the below link:
For windows , you can use any of the below commands or you can build tensorflow from source
conda install tensorflow-mkl
conda install tensorflow-mkl -c anaconda
The steps to build tensorflow from source is available in the above documentation.
We are checking on your other queries internally, will get back to you soon with an update.
Regarding your second query, "Did I chose a poor example for multi CPU execution? If so, what is a better example?":
You could use the same sample (mnist_convnet.py), it will work fine with multi-threading.
Regarding the sanity check, we are checking from our end, will let you know the updates soon.
Could you please let us know the version of tensorflow you are using?
"conda list" shows:
tensorflow 2.3.0 mkl_py38h37f7ee5_0
tensorflow-base 2.3.0 eigen_py38h75a453f_0
tensorflow-estimator 2.3.0 pyheb71bc4_0 anaconda
tensorflow-mkl 2.3.0 h93d2e19_0
Thanks Athira. Good to know that it wasn't me failing the sanity check
Would the problem cause config settings like this (below) to misbehave?
One of my initial questions was whether I need distributed workers to get both XEON CPUs on a single machine to share the load or whether the XEON platform should do this without distributed workers. Of course, I'm hoping distributed workers aren't necessary on a single machine since I believe that approach is designed for multiple machines in a network and will be slower than two CPUs that already share memory.