- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I find that I'm getting near identical performance in speed for running CNNs on the CPU when using AVX2 and SSE4 as the extension (choosing libcpu_extension_avx2.so and libcpu_extension_sse4.so respectively)
Looking at the Intel hardware specification, AVX2 can do 32 single precision FP operations per second, while SSE4.2 can only do 8. This difference is huge. Why am I not seeing a similar difference in actual performance?
My platform:
CPU: core i7 8700K
OS: Ubuntu 16.04 64-bit
RAM: 32 GB
OpenVINO version: 2018.2.319
CNN configuration: 4 convolution layers + 3 FC layers (8 MB of coeffs)
Python code snippet for setting the CPU mode:
global plugin #... plugin = IEPlugin(device="CPU", plugin_dirs= "") plugin.add_cpu_extension("libcpu_extension_avx2.so") #... net = cvsdk_det_net(model_xml, model_bin) #... net.infer(inputs={input_blob: im}) # Time taken for infer is measured
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
If your topology has only 2 types of layer (convolution and fc), then extension library isn't used at all. You can find the list of layers included in the extension lib here: https://software.intel.com/en-us/articles/OpenVINO-InferEngine#CPU%20Extensions
Most of the layers that are distributed as a part of CPU Plugin have runtime cpu features detection. So AVX2 code was executed in both cases on your platform.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page