Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
2 Views

inlined subroutine still slow

Jump to solution

Hi everyone,

We have some legacy code in F77, and there are many math function, like matrix and/or vector multiplication, copy vectors, initialization of vector and matrix. All of these F77 code are optimized (like unrolling). 

From the optimization report, I can see these functions are inlined and operations are all VECTORIZED (estimated potential speedup about: 1.6). 

However, if I replace these F77 function call by F90 code, 

for example (a matrix multiply a vector here)

c(:) = matmul(a(:,:),b(:)). 

I can save about 50% time for these matrix and vector operation. 

Does this mean I still have overhead even these functions are inlined?

Could anyone give me some explanationand suggestion about how to optimize these code? Thank you in advance!

 

 

0 Kudos

Accepted Solutions
Highlighted
2 Views

Don't know. You can find out

Jump to solution

Don't know. You can find out by running VTune on the Release version (with debug symbols).

You should be able to see MKL references (assuming the compute load in MKL is sufficient enough to get sampled by VTune).

Bottom-Up should be able to show the call stack.

Jim Dempsey

View solution in original post

0 Kudos
7 Replies
Highlighted
2 Views

Inlining saves the function

Jump to solution

Inlining saves the function call overhead inclusive of argument saving on stack and/or registers with its potential for saving/restoring register on stack. For function such as matrix multiply you will be comparing the implementation of your F77/F90 code against the code called by the newer compiler (principally Intel's MKL). For other than small matrices, MKL will likely be much faster than anything you can write.

By the way, the MKL call is not inlined.

Jim Dempsey

0 Kudos
Highlighted
Beginner
2 Views

Thanks for your reply, Jim.

Jump to solution

Thanks for your reply, Jim.

matmul use the MKL. 

Does following code (vectors multiplication) also calculated by using the MKL?

c(i) = sum((a(:,i) * b(:)))

 

Thanks,

GZ

0 Kudos
Highlighted
3 Views

Don't know. You can find out

Jump to solution

Don't know. You can find out by running VTune on the Release version (with debug symbols).

You should be able to see MKL references (assuming the compute load in MKL is sufficient enough to get sampled by VTune).

Bottom-Up should be able to show the call stack.

Jim Dempsey

View solution in original post

0 Kudos
Highlighted
Black Belt
2 Views

Besides what Jim said, you

Jump to solution

Besides what Jim said, you could use nm to see whether you have linked MKL.   For most purposes, sum(a*b) should be equivalent to dotprod(a,b) but it's not obvious what might be the requirements for automatic MKL substitution.  I think writing MATMUL explicitly and using the opt_matmul option of ifort (included in -O3) (gfortran has an equivalent) would be best since you have access to change source.

0 Kudos
Highlighted
2 Views

TimP,

The linking dependency of MKL only indicates MKL is linked into the application. This does not indicate if

c(i) = sum((a(:,i) * b(:)))

calls MKL.

VTune is one way to get this information (as indicated in #4), setting a Debug break at statement (which may be difficult with full optimizations), and then using the Disassembly window is another way.

Jim Dempsey

0 Kudos
Highlighted
Black Belt
2 Views

As far as I know, the only

Jump to solution

As far as I know, the only thing the compiler calls into MKL on its own for is MATMUL (when certain optimizations are enabled.) But I'll admit that my knowledge here is a bit stale.

Steve (aka "Doctor Fortran") - https://stevelionel.com/drfortran
0 Kudos
Highlighted
Beginner
2 Views

Thank you all for detailed

Jump to solution

Thank you all for detailed information!

GZ

0 Kudos