Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Why using SIMD is so slower than not using SIMD?

Raymond_S_
Beginner
986 Views

Dear all:

I compared my special Convolution algorithms, using SIMD and normal. It is very strange that using SIMD is slower.

Test result:

Using SIMD: 26ms

Not Using SIMD: 42ms.

Both algorithms are single thread. Attachment is source code.

Help!

0 Kudos
3 Replies
Vladimir_P_1234567890
986 Views

As far as I can see "Using SIMD:" is faster, isn't it?

--Vladimir

0 Kudos
Vladimir_P_1234567890
986 Views

OK, it looks there is a mistake in your description. "normal" algorithm is faster because it was autovectorized and autovectorization was more efficient then manual vectorization. 

LOOP BEGIN at C:\temp\pyramid\pyramid_fma_intrinsic\main.cpp(29,3) inlined into C:\temp\pyramid\pyramid_fma_intrinsic\main.cpp(144,2)
C:\temp\pyramid\pyramid_fma_intrinsic\main.cpp(29,3):remark #15300: LOOP WAS VECTORIZED
LOOP END
===========================================================================

For disabled autovectorization I got 

pyramid normal elapsed 96.165665 ms   =====20000.000000
pyramid fma elapsed  43.360785 ms   =====20000.000000

--Vladimir

0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
It is Not clear on what CPU tests were completed. If your tests were completed on a CPU with support of AVX ISA and in binary codes SSE / SSE2 / SSE4.x instructions are mixed with AVX instructions, then there are transitions SSEx-to-AVX / AVX-to SSEx and it affects performance ( it gets slower! ).
0 Kudos
Reply