- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all:
I compared my special Convolution algorithms, using SIMD and normal. It is very strange that using SIMD is slower.
Test result:
Using SIMD: 26ms
Not Using SIMD: 42ms.
Both algorithms are single thread. Attachment is source code.
Help!
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as I can see "Using SIMD:" is faster, isn't it?
--Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, it looks there is a mistake in your description. "normal" algorithm is faster because it was autovectorized and autovectorization was more efficient then manual vectorization.
LOOP BEGIN at C:\temp\pyramid\pyramid_fma_intrinsic\main.cpp(29,3) inlined into C:\temp\pyramid\pyramid_fma_intrinsic\main.cpp(144,2) C:\temp\pyramid\pyramid_fma_intrinsic\main.cpp(29,3):remark #15300: LOOP WAS VECTORIZED LOOP END ===========================================================================
For disabled autovectorization I got
pyramid normal elapsed 96.165665 ms =====20000.000000 pyramid fma elapsed 43.360785 ms =====20000.000000
--Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is Not clear on what CPU tests were completed.
If your tests were completed on a CPU with support of AVX ISA and in binary codes SSE / SSE2 / SSE4.x instructions are mixed with AVX instructions, then there are transitions SSEx-to-AVX / AVX-to SSEx and it affects performance ( it gets slower! ).
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page