Re: What is wrong in fsin opcode?

pluc · ‎05-15-2006

Hello,

I have very simple question: What is wrong in fsin opcode? when we use Intel C++ Compiler ver. 9.1 for IA32 processors.

Looks at program Test1 and Test2.

--------------------------

Test1:

#include

int main()

{

usingnamespace std;

double x=1;

double y;

for(int i=1;i<200000000;i++){

y=sin(x);

y+=cos(x);

x+=y;

}

cout<<<' '<<

return 0;

}

--------------------

Test2:

#include

int main()

{

usingnamespace std;

double x=1;

double y;

for(int i=1;i<200000000;i++){

y=sin(x);

x+=y;

}

cout<<<' '<<

return 0;

}

---------------

Inside the loop in the Test1 program sin(x) and cos(x) are calculatedbut in the Test2 program only sin(x) is calculated.

We should expect that the execution time of the program Test2 could be smaller than the execution time of the program Test1. But tests at my computer (Athlon XP 1400, Windows XP) show that the execution time of program Test2 is much longer than the execution time of program Test1 (25 sec for Test2 and 17 sec for Test1)!!! I used Maximum Speed plus High Level Optimizations option (/O3).

When we disassembled the Test1 program we can see that sin(x) and cos(x) are calculated by one simple fsincos opcode (the fsincos opcode is putted as inline function).

But when we disassembled the Test2 program we can see that sin(x) is calculated by some external function _sin (there are the call _sin (40FD90h) instruction in th e disassembled program). Inside the _sin function we can see many fld, fmul and faddp opcodes (that load, multiply and add floating numbers).

Could someone tell me why Intel C++ Compiler ver. 9.1 does not use simply fsin opcode in Test2 program.

The same is with opcodes for cosine, logarithm and many other functions.

Jerzy

Dale_S_Intel · ‎05-30-2006

Well, it appears that more modern processors (e.g. Core Solo and Duo, recent Pentium 4) do better by calling sin() than by using the fsin instruction.

I tried it on several different processors and only on an older Pentium 4 (1.5GHz)did I see the sin() go slower than fsincos.

Does that answer your question?

Dale

Message Edited by schouten on 05-29-200608:18 PM

pluc · ‎05-31-2006

Schouen,

I have also tested the Test1 and Test2 programs on several platforms with different kind of processors.

The modern processors, that have SSE2 enhanced instruction set, use:

a) __libm_SSE2_sin procedure for computation of sine function,

b) __libm_SSE2_sincos for computation of sine and cosine of the same number.

These procedures run faster than fsin and fsincos opcodes, respectively, and __libm_SSE2_sin runs faster than __libm_SSE2_sincos. But Intel C++ Compiler can optimize the code not only running on Pentium 4 and newer processors but also the code that runs on Pentium III processor.

My question was related to processors that do not have SSE2 enhanced instruction set. Especially, why the compiler uses _sin procedure instead of simple and faster inline fsin opcode (like fsincos opcode for computation of sine and cosine of the same number).

Jerzy

Pergelator · ‎06-15-2022

I heard a story at lunch yesterday that prompted me to look for old x87 bugs, which led me to this page. Just for grins, I took your sample code and compiled it on my 64-bit Linux box using the generic C compiler and, surprise, surprise, I got similar results. Without the cos() call, the program took eight times as long to run.