- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have seen the compiler produce optimized code (/O3 with Pentium 4 extensions) that contains the use of the Intel inverse squareroot instruction and a multiply, substituting these two instructions fora square root and a divide. I'm working on engineering code where there are numerous occurances of /sqrt() and in not one of them does the compiler choose to use the inverse squareroot instruction instead. VTuneshows me the code that the compiler generates.
Hints? Suggestions?
David
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If they are the same, then I suggest you create a test case and submit it to Intel Premier Support so that we can have optimization experts examine it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
The compile options are the same; /Op is not used. I even get the same phenomenon when I explicitly use /Oprec-sqrt- (or even /prec-). Something else must be preventing the optimizer from making these changes. I'll submit something to Premier Support.
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Penryn CPUs make such an improvement in latency of IEEE sqrt that there is much less reason to accept the more verbose code and corner case problems of the inverse sqrt and iterative improvement scheme. A single inverse sqrt and multiply, such as you describe, gives only about 12 bits precision, so no compiler uses it without the iterative improvement to get about 23 bits precision.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page