Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Computation time

ekeom
Novice
395 Views
Dear All,

I wrote two "identical" fortran routines : test_1 and test_2. Only line 29 is diffrent

test_1 line 29 is : a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))

test_2 line 29 is : c = sum(v(1:n)*b((j-1)*n +1:j*n))

But computation time is very different. Here are CPU time in secondes for different values of n.


test_1 time test_2 time

n = 500 0.905 0.016
n = 1000 7.207 0.016
n = 1500 24.523 0.047
n = 2000 58.641 0.109

I need to keep the array a. How can I change test_1 to make it as faster as test_2?

Best regards,

Didace

Ps : see source code bellow

---------------------------------------------------------------------

subroutine test_1 (a,b,n)

use const_m

implicit none

integer, intent(in) :: n

integer :: i, j, p

complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b

complex*16, dimension(:), allocatable :: v

allocate(v(n))

do i=1,n

v(:) = a(i:(n-i1)*n +i:n)

p = i -n

do j=1,n

p = p +n

a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))

enddo

enddo

deallocate(v)

end subroutine test_1

---------------------------------------------------------------------

subroutine test_2 (a,b,n)

use const_m

implicit none

integer, intent(in) :: n

integer :: i, j, p

complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b

complex*16, dimension(:), allocatable :: v

allocate(v(n))

do i=1,n

v(:) = a(i:(n-i1)*n +i:n)

p = i -n

do j=1,n

p = p +n

c = sum(v(1:n)*b((j-1)*n +1:j*n))

enddo

enddo

deallocate(v)

end subroutine test_2

---------------------------------------------------------------------
0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
395 Views

The "problem" you are seeing might not be a problem at all.

When you perform the "c = sum..." in the loop the optimization code of the compiler will note that c is not referenced in the loop, therefore all iterations of the loop excepting the last iteration may be eliminated.

Try inserting following "c = sum..."

if(c .eq. b) write(*,*) 'eq' ! use for timing test only

Where you expect c to never equal b
i.e. you want to insert an if statement using c that will never succeed. This will force the optimization code to not eliminate iterations of your loop.

Then run the timing test and compare the results.

Jim Dempsey
0 Kudos
ekeom
Novice
395 Views

Thank Jim Dempsey,

I have run the test. I got the results then previously.

Didace

0 Kudos
jimdempseyatthecove
Honored Contributor III
395 Views

Then try using the temp

[cpp]c = sum(v(1:n)*b((j-1)*n +1:j*n))
a(i1) = c
[/cpp]

Jim Dempsey
0 Kudos
ekeom
Novice
395 Views
Sorry, It me again. I try a new test, you are right test_2 timing is equivalent to test_1 with if(c.eq.b)...

Best regrads,

Didace

0 Kudos
jimdempseyatthecove
Honored Contributor III
395 Views

So then the "problem" was optimization did not provide comparable example.

For speed-up try replacing the = sum(... with an equivilentloop

The purpose being to see if the compiler generates better vectorization of code.

Then next improvement (when n is large) would be to use OpenMP on the loop.

That is

assure vectorization is use when possible
then use parallization when appropriate

Jim Dempsey
0 Kudos
Reply