Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Ankündigungen
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Computation time

ekeom
Anfänger
792Aufrufe
Dear All,

I wrote two "identical" fortran routines : test_1 and test_2. Only line 29 is diffrent

test_1 line 29 is : a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))

test_2 line 29 is : c = sum(v(1:n)*b((j-1)*n +1:j*n))

But computation time is very different. Here are CPU time in secondes for different values of n.


test_1 time test_2 time

n = 500 0.905 0.016
n = 1000 7.207 0.016
n = 1500 24.523 0.047
n = 2000 58.641 0.109

I need to keep the array a. How can I change test_1 to make it as faster as test_2?

Best regards,

Didace

Ps : see source code bellow

---------------------------------------------------------------------

subroutine test_1 (a,b,n)

use const_m

implicit none

integer, intent(in) :: n

integer :: i, j, p

complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b

complex*16, dimension(:), allocatable :: v

allocate(v(n))

do i=1,n

v(:) = a(i:(n-i1)*n +i:n)

p = i -n

do j=1,n

p = p +n

a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))

enddo

enddo

deallocate(v)

end subroutine test_1

---------------------------------------------------------------------

subroutine test_2 (a,b,n)

use const_m

implicit none

integer, intent(in) :: n

integer :: i, j, p

complex*16 :: c
complex*16, dimension(n), intent(inout) :: a
complex*16, dimension(n), intent(in ) :: b

complex*16, dimension(:), allocatable :: v

allocate(v(n))

do i=1,n

v(:) = a(i:(n-i1)*n +i:n)

p = i -n

do j=1,n

p = p +n

c = sum(v(1:n)*b((j-1)*n +1:j*n))

enddo

enddo

deallocate(v)

end subroutine test_2

---------------------------------------------------------------------
0 Kudos
5 Antworten
jimdempseyatthecove
Geehrter Beitragender III
792Aufrufe

The "problem" you are seeing might not be a problem at all.

When you perform the "c = sum..." in the loop the optimization code of the compiler will note that c is not referenced in the loop, therefore all iterations of the loop excepting the last iteration may be eliminated.

Try inserting following "c = sum..."

if(c .eq. b) write(*,*) 'eq' ! use for timing test only

Where you expect c to never equal b
i.e. you want to insert an if statement using c that will never succeed. This will force the optimization code to not eliminate iterations of your loop.

Then run the timing test and compare the results.

Jim Dempsey
ekeom
Anfänger
792Aufrufe

Thank Jim Dempsey,

I have run the test. I got the results then previously.

Didace

jimdempseyatthecove
Geehrter Beitragender III
792Aufrufe

Then try using the temp

[cpp]c = sum(v(1:n)*b((j-1)*n +1:j*n))
a(i1) = c
[/cpp]

Jim Dempsey
ekeom
Anfänger
792Aufrufe
Sorry, It me again. I try a new test, you are right test_2 timing is equivalent to test_1 with if(c.eq.b)...

Best regrads,

Didace

jimdempseyatthecove
Geehrter Beitragender III
792Aufrufe

So then the "problem" was optimization did not provide comparable example.

For speed-up try replacing the = sum(... with an equivilentloop

The purpose being to see if the compiler generates better vectorization of code.

Then next improvement (when n is large) would be to use OpenMP on the loop.

That is

assure vectorization is use when possible
then use parallization when appropriate

Jim Dempsey
Antworten