Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

inlined subroutine still slow

Guanfeng_Z_
Beginner
1,685 Views

Hi everyone,

We have some legacy code in F77, and there are many math function, like matrix and/or vector multiplication, copy vectors, initialization of vector and matrix. All of these F77 code are optimized (like unrolling). 

From the optimization report, I can see these functions are inlined and operations are all VECTORIZED (estimated potential speedup about: 1.6). 

However, if I replace these F77 function call by F90 code, 

for example (a matrix multiply a vector here)

c(:) = matmul(a(:,:),b(:)). 

I can save about 50% time for these matrix and vector operation. 

Does this mean I still have overhead even these functions are inlined?

Could anyone give me some explanationand suggestion about how to optimize these code? Thank you in advance!

 

 

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
1,685 Views

Don't know. You can find out by running VTune on the Release version (with debug symbols).

You should be able to see MKL references (assuming the compute load in MKL is sufficient enough to get sampled by VTune).

Bottom-Up should be able to show the call stack.

Jim Dempsey

View solution in original post

0 Kudos
7 Replies
jimdempseyatthecove
Honored Contributor III
1,685 Views

Inlining saves the function call overhead inclusive of argument saving on stack and/or registers with its potential for saving/restoring register on stack. For function such as matrix multiply you will be comparing the implementation of your F77/F90 code against the code called by the newer compiler (principally Intel's MKL). For other than small matrices, MKL will likely be much faster than anything you can write.

By the way, the MKL call is not inlined.

Jim Dempsey

0 Kudos
Guanfeng_Z_
Beginner
1,685 Views

Thanks for your reply, Jim.

matmul use the MKL. 

Does following code (vectors multiplication) also calculated by using the MKL?

c(i) = sum((a(:,i) * b(:)))

 

Thanks,

GZ

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,686 Views

Don't know. You can find out by running VTune on the Release version (with debug symbols).

You should be able to see MKL references (assuming the compute load in MKL is sufficient enough to get sampled by VTune).

Bottom-Up should be able to show the call stack.

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
1,685 Views

Besides what Jim said, you could use nm to see whether you have linked MKL.   For most purposes, sum(a*b) should be equivalent to dotprod(a,b) but it's not obvious what might be the requirements for automatic MKL substitution.  I think writing MATMUL explicitly and using the opt_matmul option of ifort (included in -O3) (gfortran has an equivalent) would be best since you have access to change source.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,685 Views

TimP,

The linking dependency of MKL only indicates MKL is linked into the application. This does not indicate if

c(i) = sum((a(:,i) * b(:)))

calls MKL.

VTune is one way to get this information (as indicated in #4), setting a Debug break at statement (which may be difficult with full optimizations), and then using the Disassembly window is another way.

Jim Dempsey

0 Kudos
Steve_Lionel
Honored Contributor III
1,685 Views

As far as I know, the only thing the compiler calls into MKL on its own for is MATMUL (when certain optimizations are enabled.) But I'll admit that my knowledge here is a bit stale.

0 Kudos
Guanfeng_Z_
Beginner
1,685 Views

Thank you all for detailed information!

GZ

0 Kudos
Reply