Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Data Prefetching using Fortran Directives

Sina_M_
Beginner
71 Views

Hi every one,

I am working on sparse algorithms' optimization using Intel's Fortran compiler. After applying different optimization features I want to make suitable use of data prefetching and cache utilization. In order to do that I tested several probable configurations of prefetching directives and intrinsic functions on both Intel Corei7 and AMD APU processors. But I don't get expected results. But in a specific case I think I get a real prefetching which gives me a 3-4 times speed up.

Following is the faster code:

[fortran]

    DOUBLE PRECISION, DIMENSION(:), ALLOCATABLE :: A2D, X, TEMP

    DOUBLE PRECISION :: SUM

    INTEGER :: SIZE, I, J, COUNT, BLS, I0

    SIZE = 1000000

    BLS = 21 * 25

    ALLOCATE(A2D(0:BLS * SIZE - 1))

    ALLOCATE(X(0:SIZE - 1))

    ALLOCATE(TEMP(0:BLS - 1))

    DO COUNT = 0, 50

        !$OMP PARALLEL SHARED(A2D, X, SIZE, BLS)

        !$OMP DO SCHEDULE(STATIC) PRIVATE(J, I, SUM, TEMP, I0)           

        !DEC$ SIMD

        DO J = 0, SIZE - 1           

            I0 = BLS * J

            DO I = 0, BLS - 1

                TEMP(I) = A2D(I0 + I)

            END DO           

            SUM = 0.D0

            DO I = 0, BLS - 1

                SUM = SUM + TEMP(I) * 2.D0

            END DO

            X(J) = SUM

        END DO

        !$OMP END DO

        !$OMP END PARALLEL      

    END DO

[/fortran]

 

And the following is the code I expect to be correct but is around 4 times slower (I think because the prefetch directive does not work):

[fortran]

    DOUBLE PRECISION, DIMENSION(:), ALLOCATABLE :: A2D, X

    DOUBLE PRECISION :: SUM

    INTEGER :: SIZE, I, J, COUNT, BLS, I0

    SIZE = 1000000

    BLS = 21 * 25

    ALLOCATE(A2D(0:BLS * SIZE - 1))

    ALLOCATE(X(0:SIZE - 1))

    DO COUNT = 0, 50

        !$OMP PARALLEL SHARED(A2D, X, SIZE, BLS)

        !$OMP DO SCHEDULE(STATIC) PRIVATE(J, I, SUM, TEMP, I0)           

        !DEC$ PREFETCH A2D

        DO J = 0, SIZE - 1           

            I0 = BLS * J

            SUM = 0.D0

            !DEC$ SIMD

            DO I = 0, BLS - 1

                SUM = SUM + A2D(I0 + I) * 2.D0

            END DO

            X(J) = SUM

        END DO

        !$OMP END DO

        !$OMP END PARALLEL      

    END DO

[/fortran]

I am really confused and need your help.

 

0 Kudos
0 Replies
Reply