- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm compiling the following sample program with compiler versions 15.0 Update 1 and 12.1 Update 5, with this command line:
ifort /debug /O2 /arch:ia32 costest.f90
program costest implicit none real*8 a,b read *, a b = cos(a) print *, b end program costest
They both call the _cos library function in libmmt.lib, where an auto-dispatcher selects different implementations depending on CPU features. The test CPU is Xeon X5675, which supports SSE2.
In 12.1, the dispatcher calls _cos.N, which uses SSE2 instructions. In 15.0, the dispatcher calls _cos.O, which uses SSE instructions, even though _cos.N is also available. The new code seems to have the CPU feature tests done in the wrong order. Similar code is in the implementation of _sin.
There is a speed difference between the two implementations (the SSE2 version seems to be ~2x faster; for _sin the difference is larger).
Our application is built with /arch:ia32 and we are seeing a significant performance loss due to these two functions after upgrading.
Replacing the /arch:ia32 with /fp:precise yields the same result. Removing the /arch:ia32 directly links with ___libm_sse2_cos, which is approximately as fast as the _cos.N implementation (it also does not use a runtime dispatcher).
Is this change by design or a bug?
Thanks,
-Stefan
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/arch:ia32 would not select SSE2 - that's for "Pentium Pro" instruction set. SSE2 is the default without /arch. Why are you building with /arch:ia32?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mainly due to the effort in re-validating the results after changing to /arch:sse2 (thousands of regression tests). We're moving (slowly) towards that goal.
But even without /arch:ia32, using /fp:precise has the same problem. The _cos and _sin implementations do not select the SSE2 enabled code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/Qfast-transcendentals is available to override the effects of /fp: on math library calls.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would seem that the developers feel that the SSE2 version of cos and sin doesn't meet the requirements of /fp:precise. As Tim says, you can override that. As I wrote in a presentation I made at SC13: "Performance, Accuracy, Consistency: Pick Two".
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page