A few articles about the performance gain with the use of AVX over other SIMD instructions have been shared in the site (For example, Wiener Filtering Using Intel Advanced Vector Extensions by Mr Kit Chung). The performance gain when comparing the 128 bit SSE and 256 bit AVX has also been provided (I pasted them from your site). Could anyone please tell me how the performance gain can be measured on the SDE?
Intel AVX (256-bit) Intel SSE (128-bit) AVX vs.SSE
Wiener filter with grouped arrays
I made several calls to the sse code and then the same number of calls to the AVX code on the sde. The ratio is showing a degradaion in performance of the avx version in comparison to the sse code.Below are the results I have obtained when I ran the two functions 1000 times.
intrin_wiener_rcp_sse = 0.284260 msec
intrin_wiener_rcp_avx = 15.032977 msec
Performance Improvement is 0.018909 times
How can I check the performance? Can you please help.
If you can show that your AVX code cuts the number of instructions required to execute the critical path by 50%, and does not increase the demand for data to/from cache beyond 16 bytes per clock nor depend on misaligned access, you have an excellent chance of significant speedup.