- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi every one,
I am working on sparse algorithms' optimization using Intel's Fortran compiler. After applying different optimization features I want to make suitable use of data prefetching and cache utilization. In order to do that I tested several probable configurations of prefetching directives and intrinsic functions on both Intel Corei7 and AMD APU processors. But I don't get expected results. But in a specific case I think I get a real prefetching which gives me a 3-4 times speed up.
Following is the faster code:
[fortran]
DOUBLE PRECISION, DIMENSION(:), ALLOCATABLE :: A2D, X, TEMP
DOUBLE PRECISION :: SUM
INTEGER :: SIZE, I, J, COUNT, BLS, I0
SIZE = 1000000
BLS = 21 * 25
ALLOCATE(A2D(0:BLS * SIZE - 1))
ALLOCATE(X(0:SIZE - 1))
ALLOCATE(TEMP(0:BLS - 1))
DO COUNT = 0, 50
!$OMP PARALLEL SHARED(A2D, X, SIZE, BLS)
!$OMP DO SCHEDULE(STATIC) PRIVATE(J, I, SUM, TEMP, I0)
!DEC$ SIMD
DO J = 0, SIZE - 1
I0 = BLS * J
DO I = 0, BLS - 1
TEMP(I) = A2D(I0 + I)
END DO
SUM = 0.D0
DO I = 0, BLS - 1
SUM = SUM + TEMP(I) * 2.D0
END DO
X(J) = SUM
END DO
!$OMP END DO
!$OMP END PARALLEL
END DO
[/fortran]
And the following is the code I expect to be correct but is around 4 times slower (I think because the prefetch directive does not work):
[fortran]
DOUBLE PRECISION, DIMENSION(:), ALLOCATABLE :: A2D, X
DOUBLE PRECISION :: SUM
INTEGER :: SIZE, I, J, COUNT, BLS, I0
SIZE = 1000000
BLS = 21 * 25
ALLOCATE(A2D(0:BLS * SIZE - 1))
ALLOCATE(X(0:SIZE - 1))
DO COUNT = 0, 50
!$OMP PARALLEL SHARED(A2D, X, SIZE, BLS)
!$OMP DO SCHEDULE(STATIC) PRIVATE(J, I, SUM, TEMP, I0)
!DEC$ PREFETCH A2D
DO J = 0, SIZE - 1
I0 = BLS * J
SUM = 0.D0
!DEC$ SIMD
DO I = 0, BLS - 1
SUM = SUM + A2D(I0 + I) * 2.D0
END DO
X(J) = SUM
END DO
!$OMP END DO
!$OMP END PARALLEL
END DO
[/fortran]
I am really confused and need your help.
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page