Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29415 Discussions

Double precision SIN and COS slower in 15.0 vs 12.1 (not using SSE2)

Stefan_P_1
Beginner
1,182 Views

I'm compiling the following sample program with compiler versions 15.0 Update 1 and 12.1 Update 5, with this command line:

ifort /debug /O2 /arch:ia32 costest.f90

    program costest
    implicit none

    real*8 a,b
    
    read *, a
    b = cos(a)
    print *, b
    
    end program costest

 

They both call the _cos library function in libmmt.lib, where an auto-dispatcher selects different implementations depending on CPU features. The test CPU is Xeon X5675, which supports SSE2.

In 12.1, the dispatcher calls _cos.N, which uses SSE2 instructions. In 15.0, the dispatcher calls _cos.O, which uses SSE instructions, even though _cos.N is also available. The new code seems to have the CPU feature tests done in the wrong order. Similar code is in the implementation of _sin.

There is a speed difference between the two implementations (the SSE2 version seems to be ~2x faster; for _sin the difference is larger).

Our application is built with /arch:ia32 and we are seeing a significant performance loss due to these two functions after upgrading.

Replacing the /arch:ia32 with /fp:precise yields the same result. Removing the /arch:ia32 directly links with ___libm_sse2_cos, which is approximately as fast as the _cos.N implementation (it also does not use a runtime dispatcher).

Is this change by design or a bug?

Thanks,
-Stefan

0 Kudos
4 Replies
Steven_L_Intel1
Employee
1,182 Views

/arch:ia32 would not select SSE2 - that's for "Pentium Pro" instruction set. SSE2 is the default without /arch. Why are you building with /arch:ia32?

0 Kudos
Stefan_P_1
Beginner
1,182 Views

Mainly due to the effort in re-validating the results after changing to /arch:sse2 (thousands of regression tests). We're moving (slowly) towards that goal.

But even without /arch:ia32, using /fp:precise has the same problem. The _cos and _sin implementations do not select the SSE2 enabled code.

0 Kudos
TimP
Honored Contributor III
1,182 Views

/Qfast-transcendentals is available to override the effects of /fp: on math library calls.

0 Kudos
Steven_L_Intel1
Employee
1,182 Views

It would seem that the developers feel that the SSE2 version of cos and sin doesn't meet the requirements of /fp:precise. As Tim says, you can override that. As I wrote in a presentation I made at SC13: "Performance, Accuracy, Consistency: Pick Two".

0 Kudos
Reply