I have written a code in which it is needed to multiply 4 vectors of length 2000 with 4 square matrixes of the same length then add the results up together. Finally repeat the whole procedure 31 times (as can be seen through the code below). The problem is, it takes more than 20 seconds to do these simple matrix calculations while I expected it to be much faster. Does anyone know why is it so slow or how can I do it faster?
REAL*8, ALLOCATABLE, DIMENSION(:,:) :: dummy, dummy2
REAL*8, ALLOCATABLE, DIMENSION(:,:,:) :: TPLTZ
integer Nt, NN, ic
Nt = 2000
NN = 31
ALLOCATE( dummy(4,Nt) )
dummy = 0d0
ALLOCATE( dummy2(NN,Nt) )
dummy2 = 0d0
ALLOCATE( TPLTZ(Nt,Nt,4) )
TPLTZ = 0d0
do ic=1,NN
dummy2(ic,:) = matmul(dummy(1,:),TPLTZ(:,:,1))+matmul(dummy(2,:),TPLTZ(:,:,2))+matmul(dummy(3,:),TPLTZ(:,:,3))+matmul(dummy(4,:),TPLTZ(:,:,4))
end doAre you running this code with the default project settings?
If so you are using debugging options that slow the execution.
My computer shows 4 x speedup with your code when I switch to Release configuration.
Build: Configuration Manager: Active solution configuration: Release.
You can also add a configuation dropdown list to the toolbar.
链接已复制
You could do two things:
Reverse the index order of dummy and dummy2 so that the mutiplication uses vectors of stride one.
Use the MKL library routines to do the multplication. DGEMM I think.
You could do two things:
Reverse the index order of dummy and dummy2 so that the mutiplication uses vectors of stride one.
Use the MKL library routines to do the multplication. DGEMM I think.
DGEMV should improve speed by avoiding some of the temporary vector allocations for each intermediate result.
Are you asking for the compiler to shortcut the meaningless operations? Perhaps it might do some of that at -O3.
Are you running this code with the default project settings?
If so you are using debugging options that slow the execution.
My computer shows 4 x speedup with your code when I switch to Release configuration.
Build: Configuration Manager: Active solution configuration: Release.
You can also add a configuation dropdown list to the toolbar.
You are right. I switched to the Release configuration as you said and now it is by the same factor faster. Thanks, It was amazing. BTW I get a warning message right before running as below
________________________________________________________________________________________
No debugging information
Debugging information for 'code_name.exe' can not be found or does not match. Binary was not build with debug information.
Do you want to continue debugging?
________________________________________________________________________________________
I just hit yes to get rid of it though I am not really aware of the penalty! Could you please tell me if it is OK to do so? or if I should do something to be sure that I will not get some wrong results due to debugging problems etc.
When you hit F5 it means start to debug the exe. Release exe cannot be debugged since it is designed for speed and has no debug info.
Start your exe using Ctrl F5 and it wont complain!
Or you can set full optimizations in Debug build. Debugging is difficult but you can still do some debugging.
Or you can leave full debugging on,in the project browse tree select just the file with your matrix multiplication, right-click, properties, full optimizations, build. Now only this module is full speed.
Jim Dempsey
