- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
IMHO the intrinsic functions sum(a,a) and dot_product(a,a) do the same. Therefore I have thought, they uses also the same code. But as I have found out, sum(a,a) works about 4% faster than dot_product(a,a). Please can anyone explain to me, what makes the difference of these two functions, so that they work with different speed.
For testing I use the code below on a xeon workstation with Linux and ifort Version 9.1 .
Thank you
Dieter
program dotSpeed
implicit none
integer, parameter :: N = 1000000
double precision, dimension(N) :: a
double precision :: dot
integer :: i
do i = 1, N
a = dfloat(i)
enddo
do i = 1, N
dot = sum(a*a)
c dot = dot_product(a,a)
enddo
end
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think your example may have some typos.
Your initialization contains an array assignment, so is a double loop;each elememt gets initialized a million times, which I don't think is what you intended, and every element of a finally gets initialized to N. Likewise for the loop over dot products, a dot product of one million terms gets computed one million times.
I adapted your test case slightly:
program dotSpeed
implicit none
integer, parameter :: N = 1000000
double precision, dimension(N) :: a
double precision :: dot, dot2, t1, t2
integer :: i
a = (/(i, i=1,N)/)
dot2 = 0._8
call cpu_time(t1)
do i = 1, 1000
dot = sum(a*a)
! dot = dot_product(a,a)
dot2 = dot2 + dot
enddo
call cpu_time(t2)
print *, 'time', t2-t1
print *, "dot2=", dot2
end
Printing a result at the end ensures that all the preceding code is not optimized away.
Accumulating the dot products into dot2 ensures that the compiler does not decide that all iterations of the loop are equivalent, and that it might only need to execute one iteration.
(your original test case would take at least 15minutes if code was not optimized away).
I found this test case executed in about 1 second on an Intel Core i5 system, and there was no measurable difference between SUM and DOT_PRODUCT, either for the 9.1 compiler that you used, or for the 11.1 compiler (the latest version). Note, however, that with the 9.1 compiler, you can get a modest additional speedup by enabling vectorization with a switch such as -xW. The 11.1 compiler will vectorize either form of the dot productat default optimization.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for my misprint.
Instead of sum(a,a) it should be sum(a*a) as can be see in the example.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello tim18,
of course this is only a poor example. But this holds for both cases, for sum(a*a) as well as for dot_product(a,a). Why do you think that the fully intrinsic function dot_product(a,a) has an advantage over the mix of the intrinsic function sum() with the basic operation a*a ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think your example may have some typos.
Your initialization contains an array assignment, so is a double loop;each elememt gets initialized a million times, which I don't think is what you intended, and every element of a finally gets initialized to N. Likewise for the loop over dot products, a dot product of one million terms gets computed one million times.
I adapted your test case slightly:
program dotSpeed
implicit none
integer, parameter :: N = 1000000
double precision, dimension(N) :: a
double precision :: dot, dot2, t1, t2
integer :: i
a = (/(i, i=1,N)/)
dot2 = 0._8
call cpu_time(t1)
do i = 1, 1000
dot = sum(a*a)
! dot = dot_product(a,a)
dot2 = dot2 + dot
enddo
call cpu_time(t2)
print *, 'time', t2-t1
print *, "dot2=", dot2
end
Printing a result at the end ensures that all the preceding code is not optimized away.
Accumulating the dot products into dot2 ensures that the compiler does not decide that all iterations of the loop are equivalent, and that it might only need to execute one iteration.
(your original test case would take at least 15minutes if code was not optimized away).
I found this test case executed in about 1 second on an Intel Core i5 system, and there was no measurable difference between SUM and DOT_PRODUCT, either for the 9.1 compiler that you used, or for the 11.1 compiler (the latest version). Note, however, that with the 9.1 compiler, you can get a modest additional speedup by enabling vectorization with a switch such as -xW. The 11.1 compiler will vectorize either form of the dot productat default optimization.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page