On my MAC with Intel, using vdmul does not make the multiplication of two matrices faster than the usual A*B multiplication.
Is this normal? Or does it mean that maybe I did not install MKL properly?
Thank you very much in advance for your help
The performance depends on the size of your calculating vectors and parallelization of your program and even your system workaround(CPU, OS, compiler...)Could you please provide more information and a test sample to show how you implement and how you compared with. So that I could help you to have a check. Thanks.
Dear Fiona, thank you for your answer.
The code I'm implementing is the following:
program MAIN implicit none integer, parameter :: Na = 250, Nx = 100, NS = Na*Nx integer :: rA, cA, rB, cB, icA, icB, ic real(8) :: t_in, t_out real(8) :: A(NS,Na), B(NS,Nx), KK(NS,NS) A = 1.0D0 B = 2.0D0 rA = size(A,1) cA = size(A,2) rB = size(B,1) cB = size(B,2) KK = 0.0D0 call cpu_time(t_in) do icA = 1, cA do icB = 1, cB ic = (icA - 1)*cB + icB KK(:,ic) = A(:,icA)*B(:,icB) end do end do call cpu_time(t_out) print *, '("Computing KK without vdmul takes = ",f2.6," seconds.")', t_out - t_in call cpu_time(t_in) do icA = 1, cA do icB = 1, cB ic = (icA - 1)*cB + icB call vdmul( rA, A(:,icA), B(:,icB), KK(:,ic) ) end do end do call cpu_time(t_out) print *, '("Computing KK with vdmul takes = ",f2.6," seconds.")', t_out - t_in end program MAIN
The output I'm getting is as follows:
("Computing KK without vdmul takes = ",f2.6," seconds.") 1.02616200000000
("Computing KK with vdmul takes = ",f2.6," seconds.") 1.16865000000000
I have tried varied sizes for Na and Nx: usually there is no significant difference using vdmul; if at all, it's slowlier in the latter case. The only thing that makes a significant difference for the speed of the code is whether the following line is commented or not:
KK = 0.0D0
I'm using OS X 10.11.6. Thank you for your help, and best regards,