Questions about parallelization.

velvia · ‎03-28-2011

Hi,

I've written this code in fortran to compute the sum of 1/(k*(k+1)) for k from 1 to 10**9 with a double precision accuracy.

===========================

program main

implicit none

integer, parameter :: ip = 8

integer, parameter :: rp = kind(1.0d0)

integer, parameter :: size = 10**9

integer(kind=ip) :: i

real(kind=rp) :: sum

sum =0.0_rp

do i = size,1,-1

sum = sum + 1.0_rp/real(i*(i+1),rp)

end do

write (*,*) "The result is : ",sum

end program main

===========================

I've realised that with the option -parallel, the program is 2 times faster than without it. I am quite amazed that the compiler can parallelize this program.

Of course, one core can compute the sum for i from 1 to 5*(10**8), another one can compute the sum for i from 5*(10**8)+1 to 10**9 and the 2 sums can be added. But for that, the compiler has to know that :

((((a_1+a_2)+a_3)+a_4)+...)+a_(10**9)

is the same as

((((a_1+a_2)+a_3)+a_4)+...)+a_(5*(10**8)) + ((((a_(5*(10**8)+1)+a_(5*(10**8)+2))+...)+a_(10**9))

which is not "obvious". Is it what the compiler does ?

Thanks for your answers,

Francois

fah10 · ‎03-29-2011

In fact, the compiler only sees that the terms do not depend on each other. Therefore, the loop could be arbitrary reordered in exact arithmetic. However, for floating point arithmetic, the compiler just assumes that reordering does not produce significant round-off errors. So you can never be sure in which order the terms are actually added up. This may depend on the system, number of threads etc.

If you want to be sure, in which order terms are added up, you need to parallelize the program manually and split up the loop _by hand_ (using OpenMP reductions on loops are again not deterministic)

mecej4 · ‎03-29-2011

I doubt that any compiler is "smart" enough to see that the sum from k=1:n is exactly 1 - 1 / ( n + 1 ).

velvia · ‎03-29-2011

Quoting mecej4

I doubt that any compiler is "smart" enough to see that the sum from k=1:n is exactly 1 - 1 / ( n + 1 ).

It has been done on purpose, in order to check the result ;-)
It is interesting that, in this example, you get a wrong result when you sum from 1 to 10**9.

TimP · ‎03-29-2011

In case you wished to sum to 10**10, you would need to take care to promote such expressions to 64-bit integers.

velvia · ‎03-29-2011

Quoting TimP (Intel)

In case you wished to sum to 10**10, you would need to take care to promote such expressions to 64-bit integers.

As a matter of fact, I already have to do that as I compute i*(i+1) as an integer expression for i = 10**9.

TimP · ‎03-29-2011

in the code you posted, you didn't use 64-bit integers in your parameter calculation, and you have an undefined variable in your write. So, it's hard to guess what you actually tested.

velvia · ‎03-29-2011

Quoting TimP (Intel)

in the code you posted, you didn't use 64-bit integers in your parameter calculation, and you have an undefined variable in your write. So, it's hard to guess what you actually tested.

Sorry. It was my mistake for the variable undefined (I've now corrected it in my first post).

I am not using a 64 bit integer for size but I am using it for i.