- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am just starting with SSE optimizations.I tried a very simple task of adding an array
of 2-dimensional vectors.I made three versions - without utilizing
any sort of SIMD instructions (http://pastebin.com/m3e8838c2), using
SSE2 instructions via intel intrinsics (http://pastebin.com/m783f8e7d)
and using SSE2 instructions through GCC vector intrinsics
(http://pastebin.com/m6f36194e). The best times obtained were without
using any SIMD instructions. I used the gcc 4.2 compiler with -march=prescott and
-O3 flags.
When I tried compiling without the -O3 flag, the code with the gcc
vector intrinsics was 1.5 times faster than the one without SIMD
instructions, and intel intrinsics code was the slowest :-(.
Any help will be greatly appreciated.
Regards
Gautam
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I doubt this was the original intended topic of this forum, but maybe it's good preparation for AVX.
Your Intel intrinsics code forces use of more instructions than the gcc vector intrinsics. That might be OK on the old NetBurst processors, including prescott, if you have one of the highest clock speeds, so I'm guessing you may not have met all of those qualifications.As -march=prescott would be a reasonable choice for this code on Core Duo, for example, I can't infer what CPU you chose. Also, it'sprobably unrolled beyond optimum.
There were a lot of gcc 4.2 compilers. I'm not sure whether any of them enabled auto-vectorization at -O3, as 4.3 did. If so, that would involve SIMD instructions, and could demonstrate that a compiler can do a better job of auto-vectorization than you dowhen youtie it down to specific instructions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
PS - Could you suggest me an appropriate forum for this type of query?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For questions about usage of gcc, you can go to www.gcc.gnu.org find the mailing list reference and subscribe to gcc-help@gcc.gnu.org.
I think you are saying that your plain C code, compiled with -O3, is faster than your other versions. When you dictate use of SSE intrinsics, chances are there isn't enough change in the generated code with -O level to make a difference.
Your gcc should support the flag -ftree-vectorize to perform auto-vectorization. The additional flag -ftree-vectorizer-verbose=2 will tell about vectorization actions.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page