- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear friends,
I use the intel fortran over Linux and I would like to check whether my code is complied by fma() instruction.
The code I used to test is like:
program test implicit none real, dimension(7) :: a, b real::x,y,z,u a = (/ 2.0, 3.0, 5.0, 7.0, 11.0, 13.0, 17.0 /) b = (/ 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0 /) print *, sum(a*b) x=1.0 y=2.34 z=0.987 u= ( x+ (y*z) ) print*,u end program
The compile them, I use:
ifort -xcore-avx2 -fma a.f90 -o a.fma.x ifort -no-fma a.f90 -o a.nofma.x
Therefore, I expect that the a.fma.x should using fma while a.nofma.x not.
However, when I exam their by objdump -d a.fma.x | grep fma, I can not find anything.
I expected to see the "vfmadd**" to be appear.
I wonder how can I make sure my code is using fma instruction?
Thanks!
Shuo
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your posted code may permit the compiler to evaluate the expressions at compile time, possibly eliminating normal choice of whether to evaluate by serial or parallel fma.
The parentheses in your expression for u may be taken as a request to avoid fma.
You could save assembly code by -S option and examine that to answer such questions, or examine compiled code by objdump, VTune, Parallel Advisor, or ....
Advisor does notate loops which use fma.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Tim said optimized compiler will recognize compile time data as in your a and b arrays and perform some intermediate computation in the compile time. In your example variables x,y,z are know at compile time so optimizer can compute MADD operation at compile time and generate only result which usually will be loaded on the stack at location reserved for local frame variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may look at this simple example where the result of single call to sin function is eliminated because the variable which contains result is not used further in the program and variable d_rsin is declared volatile so compiler computed the call to sin in compile time and generated code to load the result from the CONST segment into xmm0 register and copy it to local main variables storage pointed by esp register.
Simple example in C++
//........ double x{ 0.45 }; double y{ 0.45 }; volatile double d_rsin{ ::sin(x + y) }; // declare d_rsin as a volatile /* Resulting assembly code */ // non-relevant machine code removed 01101026 C5 FB 10 05 78 61 10 01 vmovsd xmm0,qword ptr ds:[1106178h] 0110102E C5 FB 11 44 24 14 vmovsd qword ptr [esp+14h],xmm0 // none-relevant machine code removed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To avoid optimization, you could add "volatile" attribute to x, y, z to force compiler to generate complete code.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page