- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
I wrote two "identical" fortran routines : test_1 and test_2. Only line 29 is diffrent
test_1 line 29 is : a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))
test_2 line 29 is : c = sum(v(1:n)*b((j-1)*n +1:j*n))
But computation time is very different. Here are CPU time in secondes for different values of n.
test_1 time test_2 time
n = 500 0.905 0.016
n = 1000 7.207 0.016
n = 1500 24.523 0.047
n = 2000 58.641 0.109
I need to keep the array a. How can I change test_1 to make it as faster as test_2?
Best regards,
Didace
Ps : see source code bellow
----------------------------------------------------------
subroutine test_1 (a,b,n)
use const_m
implicit none
integer, intent(in) :: n
integer :: i, j, p
complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b
complex*16, dimension(:), allocatable :: v
allocate(v(n))
do i=1,n
v(:) = a(i:(n-i1)*n +i:n)
p = i -n
do j=1,n
p = p +n
a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))
enddo
enddo
deallocate(v)
end subroutine test_1
----------------------------------------------------------
subroutine test_2 (a,b,n)
use const_m
implicit none
integer, intent(in) :: n
integer :: i, j, p
complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b
complex*16, dimension(:), allocatable :: v
allocate(v(n))
do i=1,n
v(:) = a(i:(n-i1)*n +i:n)
p = i -n
do j=1,n
p = p +n
c = sum(v(1:n)*b((j-1)*n +1:j*n))
enddo
enddo
deallocate(v)
end subroutine test_2
-----------------------------------------------------------
I wrote two "identical" fortran routines : test_1 and test_2. Only line 29 is diffrent
test_1 line 29 is : a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))
test_2 line 29 is : c = sum(v(1:n)*b((j-1)*n +1:j*n))
But computation time is very different. Here are CPU time in secondes for different values of n.
test_1 time test_2 time
n = 500 0.905 0.016
n = 1000 7.207 0.016
n = 1500 24.523 0.047
n = 2000 58.641 0.109
I need to keep the array a. How can I change test_1 to make it as faster as test_2?
Best regards,
Didace
Ps : see source code bellow
----------------------------------------------------------
subroutine test_1 (a,b,n)
use const_m
implicit none
integer, intent(in) :: n
integer :: i, j, p
complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b
complex*16, dimension(:), allocatable :: v
allocate(v(n))
do i=1,n
v(:) = a(i:(n-i1)*n +i:n)
p = i -n
do j=1,n
p = p +n
a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))
enddo
enddo
deallocate(v)
end subroutine test_1
----------------------------------------------------------
subroutine test_2 (a,b,n)
use const_m
implicit none
integer, intent(in) :: n
integer :: i, j, p
complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b
complex*16, dimension(:), allocatable :: v
allocate(v(n))
do i=1,n
v(:) = a(i:(n-i1)*n +i:n)
p = i -n
do j=1,n
p = p +n
c = sum(v(1:n)*b((j-1)*n +1:j*n))
enddo
enddo
deallocate(v)
end subroutine test_2
-----------------------------------------------------------
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the test, did you compile in debug or release mode? In release mode the compiler will vectorise the loop and optimise the memory-handling for the array a(i), but in debug mode this doesn't happen.
Stephen.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - eos pengwern
For the test, did you compile in debug or release mode? In release mode the compiler will vectorise the loop and optimise the memory-handling for the array a(i), but in debug mode this doesn't happen.
Stephen.
For your answer. I have used the rease mode.
Best regards,
Didace
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In release mode the compiler's optimiser should work out that test_2 effectively does nothing - the value assigned to 'c' isn't used, so it won't bother with that calculation. I suspect it would eliminate most of the code associated with that routine.
By the way, your subscript triplets ("i:(n-i1)*n +i:n" etc) look problematic. When you run your tests under debug mode with "Check array and string bounds" on (/check:bounds on the command line), what happens?
By the way, your subscript triplets ("i:(n-i1)*n +i:n" etc) look problematic. When you run your tests under debug mode with "Check array and string bounds" on (/check:bounds on the command line), what happens?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - IanH
In release mode the compiler's optimiser should work out that test_2 effectively does nothing - the value assigned to 'c' isn't used, so it won't bother with that calculation. I suspect it would eliminate most of the code associated with that routine.
By the way, your subscript triplets ("i:(n-i1)*n +i:n" etc) look problematic. When you run your tests under debug mode with "Check array and string bounds" on (/check:bounds on the command line), what happens?
By the way, your subscript triplets ("i:(n-i1)*n +i:n" etc) look problematic. When you run your tests under debug mode with "Check array and string bounds" on (/check:bounds on the command line), what happens?
Ian is rigth.
(Note: I assume i1 is not changing)
That what you tested/showed is called Loop Invariant Motion. Store instructions are removed in test 2. I attach (copy of) an example with explanation why it is slow.
A pointer variable is used inside the loop. The target value changes but the value of the pointer itself does not change inside the loop. Using a loop invariant pointer results in the execution of redundant memory load and store operations.
Note: In Fortran, pointers are used to reference dummy arguments.
subroutine xmpl03(a,n,b)
integer n
integer a(n),b
integer lim
lim = n
do 10 i=1,lim
a(1)=a(1)+b
/* An array variable a(1) whose index does not change is used for computation inside the loop. Both a and b are
dummy arguments which are referenced indirectly using pointers. Redundant stores are executed for the loop invariant array variable. */
10 continue
end
A.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page