- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I attempting to use SSE and AVX instructions to optimise my program. I have 3 versions of my code: Scalar, SSE, and AVX.
After much optimization, my SSE version is pretty much 4x as fast as my scalar code. This was actually quite suprising, I did not expect to get so close to 4x improvement.
However, my AVX version is 20% slower than my scalar code!
The program is operating on SoA data, so the difference between SSE and AVX versions is very small (just dividing the upper bound of the loop by 2, and incrementing the pointer by 2x).
If I write a simple test program that sums two arrays, I can indeed see that AVX is 2x as fast as SSE, and 8x as fast as scalar code.
My actual algorithm is pretty benign in terms of instructions, I do not use many exotic instructions. Mostly mulps, addps, and rcpps.
I'm using intrinsic functions in VS2010 SP1, and I have an i5 2500 CPU.
I am wondering if there is something subtle that I might be doing wrong?
Thanks in advance for any ideas.
I attempting to use SSE and AVX instructions to optimise my program. I have 3 versions of my code: Scalar, SSE, and AVX.
After much optimization, my SSE version is pretty much 4x as fast as my scalar code. This was actually quite suprising, I did not expect to get so close to 4x improvement.
However, my AVX version is 20% slower than my scalar code!
The program is operating on SoA data, so the difference between SSE and AVX versions is very small (just dividing the upper bound of the loop by 2, and incrementing the pointer by 2x).
If I write a simple test program that sums two arrays, I can indeed see that AVX is 2x as fast as SSE, and 8x as fast as scalar code.
My actual algorithm is pretty benign in terms of instructions, I do not use many exotic instructions. Mostly mulps, addps, and rcpps.
I'm using intrinsic functions in VS2010 SP1, and I have an i5 2500 CPU.
I am wondering if there is something subtle that I might be doing wrong?
Thanks in advance for any ideas.
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it would be hard to recommend something without been able to see your code
and as a suggestion for now, try to see it there is so called AVX/SSE transition case,
for this, you can use simple and powerful tool described here:
http://software.intel.com/en-us/articles/intel-software-development-emulator/#TRANSITION
and as a suggestion for now, try to see it there is so called AVX/SSE transition case,
for this, you can use simple and powerful tool described here:
http://software.intel.com/en-us/articles/intel-software-development-emulator/#TRANSITION
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page