sse4 vs. avx2

Raghavan_S_ · ‎08-07-2018

I find that I'm getting near identical performance in speed for running CNNs on the CPU when using AVX2 and SSE4 as the extension (choosing libcpu_extension_avx2.so and libcpu_extension_sse4.so respectively)

Looking at the Intel hardware specification, AVX2 can do 32 single precision FP operations per second, while SSE4.2 can only do 8. This difference is huge. Why am I not seeing a similar difference in actual performance?

My platform:
CPU: core i7 8700K
OS: Ubuntu 16.04 64-bit
RAM: 32 GB
OpenVINO version: 2018.2.319
CNN configuration: 4 convolution layers + 3 FC layers (8 MB of coeffs)

Python code snippet for setting the CPU mode:

global plugin
#...
plugin = IEPlugin(device="CPU", plugin_dirs= "")
plugin.add_cpu_extension("libcpu_extension_avx2.so")
#...
net = cvsdk_det_net(model_xml, model_bin)
#...
net.infer(inputs={input_blob: im})    # Time taken for infer is measured

Vladimir_P_Intel3 · ‎08-10-2018

Hi,

If your topology has only 2 types of layer (convolution and fc), then extension library isn't used at all. You can find the list of layers included in the extension lib here: https://software.intel.com/en-us/articles/OpenVINO-InferEngine#CPU%20Extensions

Most of the layers that are distributed as a part of CPU Plugin have runtime cpu features detection. So AVX2 code was executed in both cases on your platform.