AVX Performance Measure

inteleverywhere · ‎06-24-2010

Hi All,
A few articles about the performance gain with the use of AVX over other SIMD instructions have been shared in the site (For example, Wiener Filtering Using Intel Advanced Vector Extensions by Mr Kit Chung). The performance gain when comparing the 128 bit SSE and 256 bit AVX has also been provided (I pasted them from your site). Could anyone please tell me how the performance gain can be measured on the SDE?

Intel AVX (256-bit) Intel SSE (128-bit) AVX vs.SSE

Wiener filter

45871 669331.46x

Wiener filter with grouped arrays

42464644731.51x

I made several calls to the sse code and then the same number of calls to the AVX code on the sde. The ratio is showing a degradaion in performance of the avx version in comparison to the sse code.Below are the results I have obtained when I ran the two functions 1000 times.

intrin_wiener_rcp_sse = 0.284260 msec
intrin_wiener_rcp_avx = 15.032977 msec
Performance Improvement is 0.018909 times

How can I check the performance? Can you please help.

Thanks

TimP · ‎06-24-2010

In the public SDE, the only measure (distantly) related to performance is the instruction mix count. As you saw, the time required to run the emulation has no relationship to expected hardware performance.
If you can show that your AVX code cuts the number of instructions required to execute the critical path by 50%, and does not increase the demand for data to/from cache beyond 16 bytes per clock nor depend on misaligned access, you have an excellent chance of significant speedup.