Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Very slow matrix multiplication!

Mahdi_S_1
Beginner
1,993 Views

I have written a code in which it is needed to multiply 4 vectors of length 2000 with 4 square matrixes of the same length then add the results up together. Finally repeat the whole procedure 31 times (as can be seen through the code below). The problem is, it takes more than 20 seconds to do these simple matrix calculations while I expected it to be much faster. Does anyone know why is it so slow or how can I do it faster?

REAL*8, ALLOCATABLE, DIMENSION(:,:) :: dummy, dummy2

REAL*8, ALLOCATABLE, DIMENSION(:,:,:) :: TPLTZ

integer Nt, NN, ic

Nt = 2000

NN = 31

ALLOCATE( dummy(4,Nt) )

dummy = 0d0

ALLOCATE( dummy2(NN,Nt) )

dummy2 = 0d0

ALLOCATE( TPLTZ(Nt,Nt,4) )

TPLTZ = 0d0

do ic=1,NN

dummy2(ic,:) = matmul(dummy(1,:),TPLTZ(:,:,1))+matmul(dummy(2,:),TPLTZ(:,:,2))+matmul(dummy(3,:),TPLTZ(:,:,3))+matmul(dummy(4,:),TPLTZ(:,:,4))

end do
0 Kudos
1 Solution
Andrew_Smith
Valued Contributor I
1,993 Views

Are you running this code with the default project settings?

If so you are using debugging options that slow the execution.

My computer shows 4 x speedup with your code when I switch to Release configuration.

Build: Configuration Manager: Active solution configuration: Release.

You can also add a configuation dropdown list to the toolbar.

View solution in original post

0 Kudos
8 Replies
Andrew_Smith
Valued Contributor I
1,993 Views

You could do two things:

Reverse the index order of dummy and dummy2 so that the mutiplication uses vectors of stride one.

Use the MKL library routines to do the multplication. DGEMM I think.

0 Kudos
TimP
Honored Contributor III
1,993 Views
Quoting Andrew Smith

You could do two things:

Reverse the index order of dummy and dummy2 so that the mutiplication uses vectors of stride one.

Use the MKL library routines to do the multplication. DGEMM I think.

DGEMV should improve speed by avoiding some of the temporary vector allocations for each intermediate result.

Are you asking for the compiler to shortcut the meaningless operations? Perhaps it might do some of that at -O3.

0 Kudos
Mahdi_S_1
Beginner
1,993 Views
Actually I was wondering why such a simple calculation is so slow in FORTRAN, though I know it is not a small size problem (not yet a really large one) so thinking that maybe I am using bad way of coding that slows it down i.e. I can use some programming tricks that makes it faster such as what . I just timed MATLAB in doing so and it seems that MATLAB is even faster that FORTRAN!!!
0 Kudos
Andrew_Smith
Valued Contributor I
1,994 Views

Are you running this code with the default project settings?

If so you are using debugging options that slow the execution.

My computer shows 4 x speedup with your code when I switch to Release configuration.

Build: Configuration Manager: Active solution configuration: Release.

You can also add a configuation dropdown list to the toolbar.

0 Kudos
Mahdi_S_1
Beginner
1,993 Views

You are right. I switched to the Release configuration as you said and now it is by the same factor faster. Thanks, It was amazing. BTW I get a warning message right before running as below

________________________________________________________________________________________

No debugging information

Debugging information for 'code_name.exe' can not be found or does not match. Binary was not build with debug information.

Do you want to continue debugging?

________________________________________________________________________________________

I just hit yes to get rid of it though I am not really aware of the penalty! Could you please tell me if it is OK to do so? or if I should do something to be sure that I will not get some wrong results due to debugging problems etc.

0 Kudos
Andrew_Smith
Valued Contributor I
1,993 Views

When you hit F5 it means start to debug the exe. Release exe cannot be debugged since it is designed for speed and has no debug info.

Start your exe using Ctrl F5 and it wont complain!

0 Kudos
IDZ_A_Intel
Employee
1,993 Views

Or you can set full optimizations in Debug build. Debugging is difficult but you can still do some debugging.

Or you can leave full debugging on,in the project browse tree select just the file with your matrix multiplication, right-click, properties, full optimizations, build. Now only this module is full speed.

Jim Dempsey

0 Kudos
Mahdi_S_1
Beginner
1,993 Views
Thank you guys for the very useful information and tricks.
0 Kudos
Reply