- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I read the article
It mentioned that the 2nd generation instructions such as AVX512_VNNI are optimized for Neural Network
I ran one of INT8 models in IntelAI
https://github.com/IntelAI/models/tree/master/benchmarks
Here is my environment
- Docker: docker.io/intelaipg/intel-optimized-tensorflow:latest
- CPU info
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping: 7
CPU MHz: 1838.080
BogoMIPS: 5000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
I expect to run the Neural Network by the 2 gen instructions (AVX512_VNNI)
but it shows that the following optimized instructions are used:
AVX512F, AVX2, FMA
Is the docker image the optimized version to run Neural Network?
How can I get the information whether AVX512_VNNI is used or not?
How can I compile the code provided by IntelAI by the 2 gen Intel instructions?
Which docker image can I use to run the program?
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: AVX2 AVX512F FMA
The message shown above doesn't make sense in Intel optimization for tensorflow, since either MKL-DNN or MKL will do the dynamic dispatch in runtime to take advantage of the latest instruction set that is supported on your hardware.
For MKL-DNN, it will show
dnnl_verbose,info,DNNL v1.1.0 (commit 5be2cfea21ec6d1d29f52600553baff53e30aedb) dnnl_verbose,info,Detected ISA is Intel AVX-512 with Intel DL Boost
For MKL, it will show
MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Vector Neural Network Instructions enabled processors, Lnx 2.10GHz lp64 intel_thread
If you see the above messages, then it will use VNNI in runtime.
Use the following commands to show verbose messages.
export MKLDNN_VERBOSE=1 [or] export DNNL_VERBOSE=1 export MKL_VERBOSE=1
As for the performance, have you tried environment variables like KMP_AFFINITY and/or OMP_NUM_THREADS?
These environment variables effect performance. They will be set automatically in some of the docker images, in some you have to set them by yourselves.
All docker images released in docker.io/intelaipg/intel-optimized-tensorflow should all have been enabled for VNNI support.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
May I know which model had you tested? Please also let me know your steps.
1. If you run the benchmark with environment variable DNNL_VERBOSE set to 1, you will see messages like the following at the beginning of all verbose messages. If VNNI is supported, it will shown in these messages.
dnnl_verbose,info,DNNL v1.1.0 (commit 5be2cfea21ec6d1d29f52600553baff53e30aedb) dnnl_verbose,info,Detected ISA is Intel AVX-512 with Intel DL Boost
2. You don't need to compile code. DNNL will dispatch code in runtime automatically.
3. Please use the docker images mentioned in the github page, like https://github.com/IntelAI/models/tree/master/benchmarks/image_recognition/tensorflow/resnet50.
e.g. gcr.io/deeplearning-platform-release/tf-cpu.1-15
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I run "wide & deep" model.
The Int8 Model can't run on docker "gcr.io/deeplearning-platform-release/tf-cpu.1-15"
Some error occurs.
I choose docker "docker.io/intelaipg/intel-optimized-tensorflow:latest", which I think it's the last version of optimized-tensorflow with MKL-DNN
It shows some messages:
I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: AVX2 AVX512F FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
I think it works with instructions optimization.
But I can't see the message you said.
dnnl_verbose,info,DNNL v1.1.0 (commit 5be2cfea21ec6d1d29f52600553baff53e30aedb)
dnnl_verbose,info,Detected ISA is Intel AVX-512 with Intel DL Boost
I also try "export DNNL_VERBOSE=1"
But it doesn't work
Is there any wrong?
Would you provide a docker image for us to run "deep & wide" by VNNI?
Thank you very much~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The dataset is large, and it takes time to download.
Could you please try MKLDNN_VERBOSE=1 instead?
I'll try and investigate this issue once I got dataset downloaded.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lin ChiungLiang,
Two items that may help.
1) The message that was output by the CPU feature guard is helpful. It means that the binary was compiled with GCC flags that used AVX instructions, but to allow the container to work on the greatest number of systems possible, it was not compiled with *static* AVX2, AVX512, or AVX512_VNNI instructions in the eigen library, which would cause TensorFlow in that container to crash when run on older systems.
However, MKL-DNN detects CPU features at run-time and adjusts accordingly. Thus, when TensorFlow loads the MKL-DNN library, AVX512_VNNI instructions will be used if they are available on that system.
2) The TensorFlow version in the docker.io/intelaipg/intel-optimized-tensorflow:latest container is TensorFlow 1.15, which uses MKL-DNN version 0.x. If you want to see the verbose output in that version, as suggested above, you need to set MKLDNN_VERBOSE=1. DNNL_VERBOSE=1 will only work once MKL-DNN 1.x has been integrated into Tensorflow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Robison,
Thanks for your information.
1) Is there any command to check the version of MKL-DNN?
2) As I mentioned, I'd like to run int8 "wide & deep" model, would you please let me know which docker image should I use?
3) I found that when I run the model, it can't fully utilize CPUs.
I am sure all cpus are used, but don't know why the utilization of CPUs are still low, about 30%~40%
Lot of thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm still checking with the dev team for the cpu usage. Probably the workload of this task is not large enough.
Alternatively, you may wish to try environment variables like KMP_AFFINITY. (https://software.intel.com/en-us/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference)
1) Please check the first MKLDNN verbose message. MKLDNN newer than 0.18 will print its version information as the first verbose message.
2) The docker image mentioned in the github page works for int8 of this model. Please just use that docker image.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Any update?
I'm still waiting for your reply.
I tried the docker image you mentioned in the github, but it didn't the optimized one.
It only used optimized instructions (AVX512F)
the performance of the model on the docker is even poor (longer computational time) than the docker I mentioned.
docker.io/intelaipg/intel-optimized-tensorflow:latest
It used instructions AVX512F, AVX2, FMA
Please help me to check which docker image is the best
Lot of thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: AVX2 AVX512F FMA
The message shown above doesn't make sense in Intel optimization for tensorflow, since either MKL-DNN or MKL will do the dynamic dispatch in runtime to take advantage of the latest instruction set that is supported on your hardware.
For MKL-DNN, it will show
dnnl_verbose,info,DNNL v1.1.0 (commit 5be2cfea21ec6d1d29f52600553baff53e30aedb) dnnl_verbose,info,Detected ISA is Intel AVX-512 with Intel DL Boost
For MKL, it will show
MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Vector Neural Network Instructions enabled processors, Lnx 2.10GHz lp64 intel_thread
If you see the above messages, then it will use VNNI in runtime.
Use the following commands to show verbose messages.
export MKLDNN_VERBOSE=1 [or] export DNNL_VERBOSE=1 export MKL_VERBOSE=1
As for the performance, have you tried environment variables like KMP_AFFINITY and/or OMP_NUM_THREADS?
These environment variables effect performance. They will be set automatically in some of the docker images, in some you have to set them by yourselves.
All docker images released in docker.io/intelaipg/intel-optimized-tensorflow should all have been enabled for VNNI support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for your response
Finally, I saw the messages when I enable the flags you mentioned.
mkldnn_verbose,info,Intel MKL-DNN v0.20.3 (commit N/A) mkldnn_verbose,info,Detected ISA is Intel AVX-512 with Intel DL Boost
There are some questions
1) If the flags are not set as 1, does the program run without VNNI? Or it only run without printing the information?
The results are interesting, when I enable the flags, the computational time increases a little.
I think it only run without printing when the flags are disabled, so when run the program with printing information, the results got worse.
2) About CPU utilization
I set the number of cores = number of threads = #cores, number of inter-threads = 2
utilization rate can't reach 100%
If you have any comment, please let me know
Lot of thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
1) Exactly your understanding is correct. The environment variable just controls whether to print message or not.
2) It is possible that CPU usage not going to 100%, depending on use case. Are you satisfying with the performance?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The performance is good.
I just want to know how to perform the best result.
Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
For simple methods you can take a reference to the following article.
For more advanced ways, you need to profile the execution to see which parts take the longest time, and improve them accordingly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please confirm whether the solution provided was helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Yes, thanks for your help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the confirmation. We are closing this thread. Feel free to open a new thread if you have any further queries.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page