- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Bryan,
Using the Intel compilers for processor-specific optimization is probably your best choice since, like you said, it allows you to keep clear-to-read C++ sources. The Intel compilers can be expected to optimize for all processors released in the future, whereas inline assembly ultimately will become obsolete. As for documentation, allow me to point to
The Software Vectorization Handbook
(see http://www.intel.com/intelpress/sum_vmmx.htm), which contains a detailed description of SSE/SSE2/SSE3, the compiler methodology used to exploit these extensions, as well as programming guidelines that may improve the effectiveness of automatic vectorization while keeping your sources clean and portable. You may also want to browse for other books at Intel Press (see http:://www.intel.com/intelpress/), or visit
"The Intel Technical Optimization Center"
(see http://appzone.intel.com/literature/index.asp) for the latest technical information on Intels products, including optimization manuals.
Hope this helps.
Aart Bik
http://www.aartbik.com/
Message Edited by abik on 06-03-2005 12:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Bryan,
If assembly yields much better performance in a function that is critical to your application, then, yes, of course I think that is important. That is why compilers provide the inline-assembly feature after all. However, I always try to encourage our customers toimprove performance at source code level first, possibly supplemented with the appropriate compiler hints and switches. All too often, source code that could have been optimized very well with only minor modifications is written into non-portable and obscure assembly just because the programmer was not willing to invest alittle more time in getting to know the compiler better (for reasons unknown to me, investing huge amounts of time coding assembly never seems a problem). As a last resort, however, inline-assembly provides possibly the best way to get high performance.
By the way, I you believe the compiler should have generated code similar to your 125ms solution, please feel free to report this to Premium Support as a performance feature request.
Aart Bik
http://www.aartbik.com/
PS. I should probably clarify that my obsolete was intended to refer to a particular instance of inline assembly (viz. a routine optimized with inline assembly for Pentium with MMX ultimately becomes obsolete), not to the concept of inline assembly itself (which, I believe, is here to stay).
Message Edited by abik on 06-03-2005 03:46 PM
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page