Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Computation time

ekeom
Novice
481 Views
Dear All,

I wrote two "identical" fortran routines : test_1 and test_2. Only line 29 is diffrent

test_1 line 29 is : a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))

test_2 line 29 is : c = sum(v(1:n)*b((j-1)*n +1:j*n))

But computation time is very different. Here are CPU time in secondes for different values of n.

test_1 time test_2 time

n = 500 0.905 0.016
n = 1000 7.207 0.016
n = 1500 24.523 0.047
n = 2000 58.641 0.109

I need to keep the array a. How can I change test_1 to make it as faster as test_2?

Best regards,

Didace

Ps : see source code bellow

----------------------------------------------------------

subroutine test_1 (a,b,n)

use const_m

implicit none

integer, intent(in) :: n

integer :: i, j, p

complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b

complex*16, dimension(:), allocatable :: v

allocate(v(n))

do i=1,n

v(:) = a(i:(n-i1)*n +i:n)

p = i -n

do j=1,n

p = p +n

a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))

enddo

enddo

deallocate(v)

end subroutine test_1

----------------------------------------------------------

subroutine test_2 (a,b,n)

use const_m

implicit none

integer, intent(in) :: n

integer :: i, j, p

complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b

complex*16, dimension(:), allocatable :: v

allocate(v(n))

do i=1,n

v(:) = a(i:(n-i1)*n +i:n)

p = i -n

do j=1,n

p = p +n

c = sum(v(1:n)*b((j-1)*n +1:j*n))

enddo

enddo

deallocate(v)

end subroutine test_2

-----------------------------------------------------------
0 Kudos
4 Replies
eos_pengwern
Beginner
481 Views

For the test, did you compile in debug or release mode? In release mode the compiler will vectorise the loop and optimise the memory-handling for the array a(i), but in debug mode this doesn't happen.

Stephen.
0 Kudos
ekeom
Novice
481 Views
Quoting - eos pengwern

For the test, did you compile in debug or release mode? In release mode the compiler will vectorise the loop and optimise the memory-handling for the array a(i), but in debug mode this doesn't happen.

Stephen.
Thank you Stephen,

For your answer. I have used the rease mode.

Best regards,

Didace
0 Kudos
IanH
Honored Contributor II
481 Views
In release mode the compiler's optimiser should work out that test_2 effectively does nothing - the value assigned to 'c' isn't used, so it won't bother with that calculation. I suspect it would eliminate most of the code associated with that routine.

By the way, your subscript triplets ("i:(n-i1)*n +i:n" etc) look problematic. When you run your tests under debug mode with "Check array and string bounds" on (/check:bounds on the command line), what happens?
0 Kudos
ArturGuzik
Valued Contributor I
481 Views
Quoting - IanH
In release mode the compiler's optimiser should work out that test_2 effectively does nothing - the value assigned to 'c' isn't used, so it won't bother with that calculation. I suspect it would eliminate most of the code associated with that routine.

By the way, your subscript triplets ("i:(n-i1)*n +i:n" etc) look problematic. When you run your tests under debug mode with "Check array and string bounds" on (/check:bounds on the command line), what happens?

Ian is rigth.

(Note: I assume i1 is not changing)

That what you tested/showed is called Loop Invariant Motion. Store instructions are removed in test 2. I attach (copy of) an example with explanation why it is slow.

A pointer variable is used inside the loop. The target value changes but the value of the pointer itself does not change inside the loop. Using a loop invariant pointer results in the execution of redundant memory load and store operations.
Note: In Fortran, pointers are used to reference dummy arguments.



subroutine xmpl03(a,n,b)
integer n
integer a(n),b
integer lim

lim = n

do 10 i=1,lim

a(1)=a(1)+b

/* An array variable a(1) whose index does not change is used for computation inside the loop. Both a and b are
dummy arguments which are referenced indirectly using pointers. Redundant stores are executed for the loop invariant array variable. */

10 continue
end


A.
0 Kudos
Reply