Time elapses differently between two similar sentences

Chaoran_Chen · ‎06-16-2013

Here is a small problem: I called this function repeatly. If I change the last line "interp_value(i)=t" into "interp_value(i)=1.d0", the program would be much faster (with the other lines unchanged, including the line "t=(f_x...etc"). The time consumed by "interp_value(i)=1.d0" would be just one fourth of the one compared with "interp_value(i)=t".

Since both lines are giving some particular real number to interp_value(i), why would the speed differ so much?

Thank you very much!

subroutine myinterp1(x,f_x,xp,N,interp_value)

implicit none
integer:: N,i,x_index
real(kind=8):: x(N), f_x(N), xp(N),interp_value(N),t
do i=1,N
x_index = minloc(abs(x-xp(i)),1)
t = (f_x(x_index+1)-f_x(x_index))/(x(x_index+1)-x(x_index))
interp_value(i)=t
end do
end subroutine myinterp1

Chaoran_Chen · ‎06-16-2013

Chaoran Chen wrote:

Here is a small problem: I called this function repeatly. If I change the last line "interp_value(i)=t" into "interp_value(i)=1.d0", the program would be much faster (with the other lines unchanged, including the line "t=(f_x...etc"). The time consumed by "interp_value(i)=1.d0" would be just one fourth of the one compared with "interp_value(i)=t".

Since both lines are giving some particular real number to interp_value(i), why would the speed differ so much?

Thank you very much!

subroutine myinterp1(x,f_x,xp,N,interp_value)

implicit none
integer:: N,i,x_index
real(kind=8):: x(N), f_x(N), xp(N),interp_value(N),t
do i=1,N
x_index = minloc(abs(x-xp(i)),1)
t = (f_x(x_index+1)-f_x(x_index))/(x(x_index+1)-x(x_index))
interp_value(i)=t
end do
end subroutine myinterp1

Sorry to mention. For the time elapsed, I only record the time used in this function, not including others.

Casey · ‎06-16-2013

When you change from assigning t to 1.d0, a few things happen.

1) You are assigning a constant value, the compiler can hardcode this store.

2) t is never used, the compiler can optimize this out completely and not even evaluate anything in this line. This may also save you overhead with cache misses referencing the arrays used in calculating t (not a given, but a potential source of performance loss).

3) the compiler can optimize out the do loop and just store 1.d0 to every array element, which can be very efficient.

Essentially by replacing t with 1.d0 you change the entire do loop to just "interp_value(1:N) = 1.d0" and it should be clear why that is faster.

jimdempseyatthecove · ‎06-17-2013

Casey is right. In Release build, the compiler optimization capability is quite remarkable. In your code it will:

a) recognize a_index is local variable
b) recognize t is local variable
c) recognize x_index use in statement to generate t
d) (when you replace =t with =1.d0) recognize t not used
e) due to d) t not used, will eliminate statement generating t
f) due to e) x_index not used, will eliminate statment generating x_index
g) due to f) now observes loop is setting constant into array

In Debug build, the compiler will generate the code you ask for even though the results are not needed.

Jim Dempsey

Chaoran_Chen · ‎06-17-2013

Thanks for the helpful comments.