Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

No speedup AVX over SSE

nik0las
Beginner
1,358 Views
Hi. I'm trying to speedup some serial code using SSE and AVX (computational code with SOA data structure). SSE version gives good speedup, up to 2 times using double and some more using float. But when I'm trying to use AVX the same way I've get same speed when using SSE. Attempts to solve this problem with google gave the result that the problem is the memory speed. Is it possible to speed up this code using AVX? OS: linux, ubuntu, x86_64 CPU: i7-2670QM Compilers: gcc and icc Code: http://code.google.com/p/le2d/source/browse/#git%2Fsrc Compile: cd src && make Run: cd tests/sse2 && ./test.sh See result: cd tests/sse2 && gnuplot -p plot1.gnu Thanks.
0 Kudos
1 Solution
TimP
Honored Contributor III
1,358 Views
Software prefetch may help if the data don't remain local to L1, but in that case performance of 2x SSE is unlikely. I've found it difficult to predict usefulness of software prefetch. It's possible (and may be the case in your example) sometimes for SSE code to take full advantage of L1 performance even on AVX capable CPUs.

View solution in original post

0 Kudos
6 Replies
TimP
Honored Contributor III
1,358 Views
You're probably aware, as you implied you researched the subject, that speedup from SSE to AVX often depends on several factors, including 32-byte data alignment, L1 cache locality, and optimum number of operations per loop. We've seen cases where stuffing lots of operations into a loop in order to optimize SSE performance could bring SSE up to the performance of AVX.
0 Kudos
nik0las
Beginner
1,358 Views
Thanks for your comment. This program use float and double numbers, 32-byte memory alignment, SOA data structures and a lot of computations per element. Also I do some work to do code more cache friendly. Maybe it's possible to speedup AVX using software prefetch? As I see AVX would work faster only when all data stored at L1 cache.
0 Kudos
nik0las
Beginner
1,358 Views
-
0 Kudos
TimP
Honored Contributor III
1,359 Views
Software prefetch may help if the data don't remain local to L1, but in that case performance of 2x SSE is unlikely. I've found it difficult to predict usefulness of software prefetch. It's possible (and may be the case in your example) sometimes for SSE code to take full advantage of L1 performance even on AVX capable CPUs.
0 Kudos
yuriisig
Beginner
1,358 Views
Usage AVX increases speed of matrix multiplication almost in 2 times.
0 Kudos
TimP
Honored Contributor III
1,358 Views
YuriiSig wrote:

Usage AVX increases speed of matrix multiplication almost in 2 times.

That's with careful hand coding, among other things gaining maximum register and L1 cache data locality, as in the new versions of MKL.
0 Kudos
Reply