I am working on color space converter applications. i have tried AVX512, AVX2 and got no significant performance improvement against SSE2.
i upload my test tool, CPU information, compiler setting and my test result here.
would you please share me any comments?
I'd like to update the status of this issue as we discussed in emails. This was caused by memory bandwidth limit and latency for AVX512 when running in single thread.
Optimize the example with Openmp and run in parallel will make use of large L2 and L3 caches on SKL, and you will see the performance improvement from AVX512 over AVX2 and SSE.
pbSrcY 0x7fd6cd06f040, pbOut1 0x7fd6c7182080 pbOut2 0x7fd6c9126100 pbOut3 0x7fd6cb0ca180
time cpu 965 sse2 95 avx2 89 avx512 82