Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Raymond_S_
Beginner
178 Views

Why using SIMD is so slower than not using SIMD?

Dear all:

I compared my special Convolution algorithms, using SIMD and normal. It is very strange that using SIMD is slower.

Test result:

Using SIMD: 26ms

Not Using SIMD: 42ms.

Both algorithms are single thread. Attachment is source code.

Help!

0 Kudos
3 Replies
178 Views

As far as I can see "Using SIMD:" is faster, isn't it?

--Vladimir

178 Views

OK, it looks there is a mistake in your description. "normal" algorithm is faster because it was autovectorized and autovectorization was more efficient then manual vectorization. 

LOOP BEGIN at C:\temp\pyramid\pyramid_fma_intrinsic\main.cpp(29,3) inlined into C:\temp\pyramid\pyramid_fma_intrinsic\main.cpp(144,2)
C:\temp\pyramid\pyramid_fma_intrinsic\main.cpp(29,3):remark #15300: LOOP WAS VECTORIZED
LOOP END
===========================================================================

For disabled autovectorization I got 

pyramid normal elapsed 96.165665 ms   =====20000.000000
pyramid fma elapsed  43.360785 ms   =====20000.000000

--Vladimir

SergeyKostrov
Valued Contributor II
178 Views

It is Not clear on what CPU tests were completed. If your tests were completed on a CPU with support of AVX ISA and in binary codes SSE / SSE2 / SSE4.x instructions are mixed with AVX instructions, then there are transitions SSEx-to-AVX / AVX-to SSEx and it affects performance ( it gets slower! ).
Reply