- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As part of the normal updates of our pure software realtime 3D engine
http://www.inartis.com/default.aspx
we have already included a new path forHaswell targets (using FMA and AVX2 instructions).
Thanks to the early support in the Intel compiler and the SDE I was able to port and validate very quickly the codeusing FMA and the 256-bit packed int instructions. A cool feature of the Intel C++ compiler is that legacy code using MUL + ADD intrinsics (such as _mm256_mul_ps / _mm256_add_ps) use FMA instructions wherever possible when compiledwith the "/QxCORE-AVX2" flag, it's a great time saver and we can continue to have exactly the samesourcecodefor all (legacy SSE &AVX andnew FMA+AVX2) paths. Also since we use wrapper classes around intrinsics, the source code is still very readable, for example
res = a*x + b*y +c;
is far more readable than if wehad to introduce FMA functions such as
res = madd(a,x,madd(b,y,c));
More optimization opportunities are still there using any to any permute and gather for example, I suppose that I'll wait for the real chipsfor these.
http://www.inartis.com/default.aspx
we have already included a new path forHaswell targets (using FMA and AVX2 instructions).
Thanks to the early support in the Intel compiler and the SDE I was able to port and validate very quickly the codeusing FMA and the 256-bit packed int instructions. A cool feature of the Intel C++ compiler is that legacy code using MUL + ADD intrinsics (such as _mm256_mul_ps / _mm256_add_ps) use FMA instructions wherever possible when compiledwith the "/QxCORE-AVX2" flag, it's a great time saver and we can continue to have exactly the samesourcecodefor all (legacy SSE &AVX andnew FMA+AVX2) paths. Also since we use wrapper classes around intrinsics, the source code is still very readable, for example
res = a*x + b*y +c;
is far more readable than if wehad to introduce FMA functions such as
res = madd(a,x,madd(b,y,c));
More optimization opportunities are still there using any to any permute and gather for example, I suppose that I'll wait for the real chipsfor these.
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your effort and the feedback,it is noticedwith a great pleasure!

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page