AVX with faster memory

magicfoot · ‎10-25-2011

I am currently running an algorithm using avx on an i5-2500kand aH61 motherboard with DDR3-1033 memory. I get an 8% speedup over the equivalent algorithm using SSE.

It seems that the data flow to the i5 when using AVX is bottlenecked by either the cloggged memory channel or because the memory chip can not supply data fasteror both. I am going to get a DDR3-2133 memory set to speed up the supply of data to the chip to determine whether Ican get more than the 8% speedup out of the AVX.

Would someone who has tried this effect of quicker memory on the AVX performance be kind enough to share their results.

From what I have read I may also use the P67 motherboard instead of the H61 as some have indicated a better memory performance with the use of the P67 alone(i.e. no upgrade in memory frequency). Comments anyone ?

Max_L · ‎10-25-2011

If the memory bandwidth is a bottleneck of the algorithm vectorization with either SSE or let alone AVX is not going to provide notable performance speed up, it is because execution units are idling most of the time anyways.

The better algorithms structure to improve locality of data (and hence cache-ability), the greater performance it gets, and the bigger benefit from vectorization it can realize.

The use of 256-bit vectorization with AVX will show the greatest benefit over the 128-bit vectorization with SSE in algorithms that are consistently hitting L1 cache while accessing the data. That is achieved by the increased locality amount of computes being done on the data fetched from the memory. The simplest example would be the matrix multiply algorithm on big matrices school book algorithm vs. memory blocking optimization.

-Max