Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Raghavan_S_
Beginner
338 Views

sse4 vs. avx2

I find that I'm getting near identical performance in speed for running CNNs on the CPU when using AVX2 and SSE4 as the extension (choosing libcpu_extension_avx2.so and libcpu_extension_sse4.so respectively)

Looking at the Intel hardware specification, AVX2 can do 32 single precision FP operations per second, while SSE4.2 can only do 8.  This difference is huge.  Why am I not seeing a similar difference in actual performance?

My platform:
CPU: core i7 8700K
OS: Ubuntu 16.04 64-bit
RAM: 32 GB
OpenVINO version: 2018.2.319
CNN configuration: 4 convolution layers + 3 FC layers (8 MB of coeffs)

Python code snippet for setting the CPU mode:

global plugin
#...
plugin = IEPlugin(device="CPU", plugin_dirs= "")
plugin.add_cpu_extension("libcpu_extension_avx2.so")
#...
net = cvsdk_det_net(model_xml, model_bin)
#...
net.infer(inputs={input_blob: im})    # Time taken for infer is measured

 

0 Kudos
1 Reply
338 Views

Hi,

If your topology has only 2 types of layer (convolution and fc), then extension library isn't used at all. You can find the list of layers included in the extension lib here: https://software.intel.com/en-us/articles/OpenVINO-InferEngine#CPU%20Extensions

Most of the layers that are distributed as a part of CPU Plugin have runtime cpu features detection. So AVX2 code was executed in both cases on your platform.

Reply