Solved: sum(a,a) vs. dot_product(a,a)

dnoack · ‎03-02-2010

Hello,

IMHO the intrinsic functions sum(a,a) and dot_product(a,a) do the same. Therefore I have thought, they uses also the same code. But as I have found out, sum(a,a) works about 4% faster than dot_product(a,a). Please can anyone explain to me, what makes the difference of these two functions, so that they work with different speed.

For testing I use the code below on a xeon workstation with Linux and ifort Version 9.1 .

Thank you

Dieter

program dotSpeed
implicit none
integer, parameter :: N = 1000000
double precision, dimension(N) :: a
double precision :: dot
integer :: i

do i = 1, N
a = dfloat(i)
enddo

do i = 1, N
dot = sum(a*a)
c dot = dot_product(a,a)
enddo

end

Martyn_C_Intel · ‎03-03-2010

I think your example may have some typos.

Your initialization contains an array assignment, so is a double loop;each elememt gets initialized a million times, which I don't think is what you intended, and every element of a finally gets initialized to N. Likewise for the loop over dot products, a dot product of one million terms gets computed one million times.

I adapted your test case slightly:

program dotSpeed

implicit none

integer, parameter :: N = 1000000

double precision, dimension(N) :: a

double precision :: dot, dot2, t1, t2

integer :: i

a = (/(i, i=1,N)/)

dot2 = 0._8

call cpu_time(t1)

do i = 1, 1000

dot = sum(a*a)

! dot = dot_product(a,a)

dot2 = dot2 + dot

enddo

call cpu_time(t2)

print *, 'time', t2-t1

print *, "dot2=", dot2

end

Printing a result at the end ensures that all the preceding code is not optimized away.

Accumulating the dot products into dot2 ensures that the compiler does not decide that all iterations of the loop are equivalent, and that it might only need to execute one iteration.

(your original test case would take at least 15minutes if code was not optimized away).

I found this test case executed in about 1 second on an Intel Core i5 system, and there was no measurable difference between SUM and DOT_PRODUCT, either for the 9.1 compiler that you used, or for the 11.1 compiler (the latest version). Note, however, that with the 9.1 compiler, you can get a modest additional speedup by enabling vectorization with a switch such as -xW. The 11.1 compiler will vectorize either form of the dot productat default optimization.

View solution in original post

dnoack · ‎03-02-2010

Sorry for my misprint.

Instead of sum(a,a) it should be sum(a*a) as can be see in the example.

TimP · ‎03-02-2010

You can't draw useful conclusions from a case where the compiler can kill as much dead code as it chooses. In a useful example, I wouldn't expect sum(a*a) to have an advantage over dot_product(a,a), although I don't want to be guessing your intended context.

dnoack · ‎03-03-2010

Hello tim18,

of course this is only a poor example. But this holds for both cases, for sum(a*a) as well as for dot_product(a,a). Why do you think that the fully intrinsic function dot_product(a,a) has an advantage over the mix of the intrinsic function sum() with the basic operation a*a ?

Martyn_C_Intel · ‎03-03-2010