Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
6590 Discussions

## Question using vdmul

Beginner
179 Views

On my MAC with Intel, using vdmul does not make the multiplication of two matrices faster than the usual A*B multiplication.

Is this normal? Or does it mean that maybe I did not install MKL properly?

4 Replies
Employee
179 Views

Dear customer,

The performance depends on the size of your calculating vectors and parallelization of your program and even your system workaround(CPU, OS, compiler...)Could you please provide more information and a test sample to show how you implement and how you compared with. So that I could help you to have a check. Thanks.

Best regards,
Fiona

Beginner
179 Views

The code I'm implementing is the following:

```program MAIN

implicit none

integer, parameter :: Na = 250, Nx = 100, NS = Na*Nx

integer            :: rA, cA, rB, cB, icA, icB, ic
real(8) 		:: t_in, t_out

real(8)            :: A(NS,Na), B(NS,Nx), KK(NS,NS)

A = 1.0D0
B = 2.0D0

rA = size(A,1)
cA = size(A,2)
rB = size(B,1)
cB = size(B,2)

KK = 0.0D0

call cpu_time(t_in)
do icA = 1, cA
do icB = 1, cB
ic = (icA - 1)*cB + icB
KK(:,ic) = A(:,icA)*B(:,icB)
end do
end do
call cpu_time(t_out)
print *, '("Computing KK without vdmul takes = ",f2.6," seconds.")', t_out - t_in

call cpu_time(t_in)
do icA = 1, cA
do icB = 1, cB
ic = (icA - 1)*cB + icB
call vdmul( rA, A(:,icA), B(:,icB), KK(:,ic) )
end do
end do
call cpu_time(t_out)
print *, '("Computing KK with vdmul takes = ",f2.6," seconds.")', t_out - t_in

end program MAIN```

The output I'm getting is as follows:

("Computing KK without vdmul takes = ",f2.6," seconds.")   1.02616200000000

("Computing KK with vdmul takes = ",f2.6," seconds.")      1.16865000000000

I have tried varied sizes for Na and Nx: usually there is no significant difference using vdmul; if at all, it's slowlier in the latter case. The only thing that makes a significant difference for the speed of the code is whether the following line is commented or not:

```    KK = 0.0D0
```

I'm using OS X 10.11.6. Thank you for your help, and best regards,

Axelle

Black Belt
179 Views
Auto vectorization of your source code ought to achieve full performance. If there is further performance to be gained by threading, it may be done better outside the inner loop. Anyway threading would at best reduce elapsed but not cpu time so if vdmul does that you may want system_clock
Beginner
179 Views

Dear Tim,