Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Shuo_Y_
Beginner
125 Views

Check the Fused Multiply-Add (fma) in Fortran90 code

Dear friends,

I use the intel fortran over Linux and I would like to check whether my code is complied by fma() instruction.

The code I used to test is like:

program test 
implicit none
real, dimension(7) :: a, b
real::x,y,z,u
a = (/ 2.0, 3.0, 5.0, 7.0, 11.0, 13.0, 17.0 /)
b = (/ 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0 /)
print *, sum(a*b)
x=1.0
y=2.34
z=0.987

u= ( x+ (y*z)  )

print*,u

end program

 

The compile them, I use:

ifort -xcore-avx2 -fma a.f90 -o a.fma.x
ifort  -no-fma a.f90 -o a.nofma.x

 

Therefore, I expect that the a.fma.x should using fma while a.nofma.x not.

However, when I exam their by objdump -d a.fma.x | grep fma, I can not find anything. 

I expected to see the "vfmadd**" to be appear.

I wonder how can I make sure my code is using fma instruction?

Thanks!

Shuo

 

 

0 Kudos
4 Replies
TimP
Black Belt
125 Views

Your posted code may permit the compiler to evaluate the expressions at compile time, possibly eliminating normal choice of whether to evaluate by serial or parallel fma.

The parentheses in your expression for u may be taken as a request to avoid fma.

You could save assembly code by -S option and examine that to answer such questions, or examine compiled code by objdump, VTune, Parallel Advisor, or ....

Advisor does notate loops which use fma.

Bernard
Black Belt
125 Views

As Tim said optimized compiler will recognize compile time data as in your a and b arrays and perform some intermediate computation in the compile time. In your example variables x,y,z are know at compile time so optimizer can compute MADD operation at compile time and generate only result which usually will be loaded on the stack at location reserved for local frame variables.

Bernard
Black Belt
125 Views

You may look at this simple example where the  result of single call to sin function is eliminated because the variable which contains result is not used further in the program and variable d_rsin is declared volatile so compiler computed the call to sin in compile time and generated code to load the result from the CONST segment into xmm0 register and copy it to local main variables storage pointed by esp register.

  Simple example in C++

 

            //........
          double x{ 0.45 };
	  double y{ 0.45 };
	  volatile double d_rsin{ ::sin(x + y) }; // declare d_rsin as a volatile


       /* Resulting assembly code */
         
       // non-relevant machine code removed
       
       01101026 C5 FB 10 05 78 61 10 01 vmovsd      xmm0,qword ptr ds:[1106178h]  
       0110102E C5 FB 11 44 24 14    vmovsd      qword ptr [esp+14h],xmm0 
       
       // none-relevant machine code removed

 

Vladimir_Sedach
New Contributor I
125 Views

To avoid optimization, you could  add "volatile" attribute to x, y, z to force compiler to generate complete code.

Reply