Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Data Prefetching using Fortran Directives

Sina_M_
Beginner
198 Views

Hi every one,

I am working on sparse algorithms' optimization using Intel's Fortran compiler. After applying different optimization features I want to make suitable use of data prefetching and cache utilization. In order to do that I tested several probable configurations of prefetching directives and intrinsic functions on both Intel Corei7 and AMD APU processors. But I don't get expected results. But in a specific case I think I get a real prefetching which gives me a 3-4 times speed up.

Following is the faster code:

[fortran]

    DOUBLE PRECISION, DIMENSION(:), ALLOCATABLE :: A2D, X, TEMP

    DOUBLE PRECISION :: SUM

    INTEGER :: SIZE, I, J, COUNT, BLS, I0

    SIZE = 1000000

    BLS = 21 * 25

    ALLOCATE(A2D(0:BLS * SIZE - 1))

    ALLOCATE(X(0:SIZE - 1))

    ALLOCATE(TEMP(0:BLS - 1))

    DO COUNT = 0, 50

        !$OMP PARALLEL SHARED(A2D, X, SIZE, BLS)

        !$OMP DO SCHEDULE(STATIC) PRIVATE(J, I, SUM, TEMP, I0)           

        !DEC$ SIMD

        DO J = 0, SIZE - 1           

            I0 = BLS * J

            DO I = 0, BLS - 1

                TEMP(I) = A2D(I0 + I)

            END DO           

            SUM = 0.D0

            DO I = 0, BLS - 1

                SUM = SUM + TEMP(I) * 2.D0

            END DO

            X(J) = SUM

        END DO

        !$OMP END DO

        !$OMP END PARALLEL      

    END DO

[/fortran]

 

And the following is the code I expect to be correct but is around 4 times slower (I think because the prefetch directive does not work):

[fortran]

    DOUBLE PRECISION, DIMENSION(:), ALLOCATABLE :: A2D, X

    DOUBLE PRECISION :: SUM

    INTEGER :: SIZE, I, J, COUNT, BLS, I0

    SIZE = 1000000

    BLS = 21 * 25

    ALLOCATE(A2D(0:BLS * SIZE - 1))

    ALLOCATE(X(0:SIZE - 1))

    DO COUNT = 0, 50

        !$OMP PARALLEL SHARED(A2D, X, SIZE, BLS)

        !$OMP DO SCHEDULE(STATIC) PRIVATE(J, I, SUM, TEMP, I0)           

        !DEC$ PREFETCH A2D

        DO J = 0, SIZE - 1           

            I0 = BLS * J

            SUM = 0.D0

            !DEC$ SIMD

            DO I = 0, BLS - 1

                SUM = SUM + A2D(I0 + I) * 2.D0

            END DO

            X(J) = SUM

        END DO

        !$OMP END DO

        !$OMP END PARALLEL      

    END DO

[/fortran]

I am really confused and need your help.

 

0 Kudos
0 Replies
Reply