Your posted code may permit

Shuo_Y_ · ‎03-24-2016

Dear friends,

I use the intel fortran over Linux and I would like to check whether my code is complied by fma() instruction.

The code I used to test is like:

program test 
implicit none
real, dimension(7) :: a, b
real::x,y,z,u
a = (/ 2.0, 3.0, 5.0, 7.0, 11.0, 13.0, 17.0 /)
b = (/ 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0 /)
print *, sum(a*b)
x=1.0
y=2.34
z=0.987

u= ( x+ (y*z)  )

print*,u

end program

The compile them, I use:

ifort -xcore-avx2 -fma a.f90 -o a.fma.x
ifort  -no-fma a.f90 -o a.nofma.x

Therefore, I expect that the a.fma.x should using fma while a.nofma.x not.

However, when I exam their by objdump -d a.fma.x | grep fma, I can not find anything.

I expected to see the "vfmadd**" to be appear.

I wonder how can I make sure my code is using fma instruction?

Thanks!

Shuo

TimP · ‎03-24-2016

Your posted code may permit the compiler to evaluate the expressions at compile time, possibly eliminating normal choice of whether to evaluate by serial or parallel fma.

The parentheses in your expression for u may be taken as a request to avoid fma.

You could save assembly code by -S option and examine that to answer such questions, or examine compiled code by objdump, VTune, Parallel Advisor, or ....

Advisor does notate loops which use fma.

Bernard · ‎04-08-2016

As Tim said optimized compiler will recognize compile time data as in your a and b arrays and perform some intermediate computation in the compile time. In your example variables x,y,z are know at compile time so optimizer can compute MADD operation at compile time and generate only result which usually will be loaded on the stack at location reserved for local frame variables.

Bernard · ‎04-08-2016

You may look at this simple example where the result of single call to sin function is eliminated because the variable which contains result is not used further in the program and variable d_rsin is declared volatile so compiler computed the call to sin in compile time and generated code to load the result from the CONST segment into xmm0 register and copy it to local main variables storage pointed by esp register.

Simple example in C++

            //........
          double x{ 0.45 };
	  double y{ 0.45 };
	  volatile double d_rsin{ ::sin(x + y) }; // declare d_rsin as a volatile


       /* Resulting assembly code */
         
       // non-relevant machine code removed
       
       01101026 C5 FB 10 05 78 61 10 01 vmovsd      xmm0,qword ptr ds:[1106178h]  
       0110102E C5 FB 11 44 24 14    vmovsd      qword ptr [esp+14h],xmm0 
       
       // none-relevant machine code removed

Vladimir_Sedach · ‎04-08-2016

To avoid optimization, you could add "volatile" attribute to x, y, z to force compiler to generate complete code.

Check the Fused Multiply-Add (fma) in Fortran90 code