Fortran Parallel Programming‎

houidef_a_ · ‎08-23-2014

Is evident that the use of parallelism leads to minimizing the time of program but this not happened me, when I programed a code parallel, the time of machine is augmented (PARALLEL TIMES >>>NO PARALLEL TIMES )??? like this :

    PROGRAM MAIN
    use omp_lib
    implicit none
    REAL*8 Times1,Times2
    INTEGER I,J
    real, allocatable, dimension(:) :: a
    allocate(a(1000))
    DO J = 1, 1000
    a(j)=j  
    ENDDO
!    ***************NO PARALLEL CODE ************************************
    call CPU_TIME(Times1)
    write(*,*) 'CPU NO PARALLEL STARTED:',Times1
    DO I = 1, 1000
    DO J = 1, 500000
    a(I)=a(I)+0.0001
    end do 
    a(I)=a(I)+a(I)+a(I)
    ENDDO
    call CPU_TIME(Times2)
    write(*,*) 'CPU CPU NO PARALLEL finished:',Times2
    write(*,*) 'NO PARALLEL TIMES:',Times2-Times1
    write(*,*) '---------------------------------------------------'
!    ***************PARALLEL CODE ************************************
    call CPU_TIME(Times1)
    write(*,*) 'CPU PARALLEL STARTED:',Times1
!$OMP PARALLEL DEFAULT(shared) 
!$OMP DO
    DO I = 1, 1000
    DO J = 1, 500000
    a(I)=a(I)+0.0001
    end do 
    a(I)=a(I)+a(I)+a(I)
    ENDDO
!$OMP END DO
!$OMP END PARALLEL
    call CPU_TIME(Times2)
    write(*,*) 'CPU PARALLEL finished:',Times2
    write(*,*) 'PARALLEL TIMES:',Times2-Times1
    deallocate(a)
    STOP
    END

and the result :

CPU NO PARALLEL STARTED: 1.560010000000000E-002
CPU CPU NO PARALLEL finished: 4.86723120000000
NO PARALLEL TIMES: 4.85163110000000
---------------------------------------------------
CPU PARALLEL STARTED: 4.86723120000000
CPU PARALLEL finished: 9.89046340000000
PARALLEL TIMES: 5.02323220000000

PARALLEL TIMES >>>NO PARALLEL TIMES ???????????????????!!!!!!!!!!!

TimP · ‎08-23-2014

Cpu time (total of all threads) normally increases with parallelism even when implemented effectively e.g. measured by omp_get_wtime(). when you invite the compiler to skip operations it may skip more in the non parallel.

mecej4 · ‎08-23-2014

Here is a slightly modified version of the program above. The modifications are (i) to prevent the optimizer from doing away with so much of the calculations as to make the "benchmark" trivial and meangless, as Tim Prince remarked. I added a statement at the end of the outer loop to calculate and print the sum of the elements of array A, and (ii) to use the OpenMP timing routine omp_get_wtime() instead of cpu_time(). The results are quite reasonable, on my system with a quad-core CPU with 1, 2, 4 and 8 threads.

Threads Serial time Parallel time

1 0.160 0.159

2 0.160 0.082

4 0.160 0.042

8 0.160 0.042

    PROGRAM MAIN
    use omp_lib
    implicit none
    REAL*8 Times1,Times2
    INTEGER I,J
    real, allocatable, dimension(:) :: a
    real s1,s2
!
    allocate(a(1000))
    DO J = 1, 1000
    a(j)=j
    ENDDO
!    ***************SERIAL CODE ************************************
    Times1=omp_get_wtime()
    write(*,'(A18,2x,F16.3)') 'SERIAL started:',Times1
    DO I = 1, 1000
    DO J = 1, 500000
    a(I)=a(I)+0.0001
    end do
    a(I)=a(I)+a(I)+a(I)
    ENDDO
    Times2=omp_get_wtime()
    s1=sum(a)
    write(*,'(A18,2x,F16.3)') 'SERIAL finished:',Times2
    write(*,'(A18,2x,F6.3,2x,ES12.3E1)') 'SERIAL TIME:',Times2-Times1,s1
    write(*,*) '---------------------------------------------------'
!    ***************PARALLEL CODE ************************************
    DO J = 1, 1000
       a(j)=j
    ENDDO
    Times1=omp_get_wtime()
    write(*,'(A18,2x,F16.3)') 'PARALLEL started:',Times1
!$OMP PARALLEL DEFAULT(shared)
!$OMP DO
    DO I = 1, 1000
    DO J = 1, 500000
    a(I)=a(I)+0.0001
    end do
    a(I)=a(I)+a(I)+a(I)
    ENDDO
!$OMP END DO
!$OMP END PARALLEL
    s2=sum(a)
    Times2=omp_get_wtime()
    write(*,'(A18,2x,F16.3)') 'PARALLEL finished:',Times2
    write(*,'(A18,2x,F6.3,2x,ES12.3E1)') 'PARALLEL TIME:',Times2-Times1,s2
    deallocate(a)
    STOP
    END

houidef_a_ · ‎08-25-2014

but problem is : omp_get_wtime() function is for system clock not for CPU times !!!!

mecej4 · ‎08-25-2014

I used a single-user system with no heavy background tasks in reporting the timings in #3.

If you are concerned about how accurate the different timing routines provided by IFort are, you may consider using the thread profiler features of Intel Vtune, if you have that product installed. After removing all timing calls, on Windows 8.1 X64 using the system mentioned in #3, I obtained the following results for the "OpenMP region", i.e., the portion of the program in the !$OMP PARALLEL block. These results differ little from the times reported by omp_get_wtime(). Note that those timing calls were all located outside the parallel region.

n_threads Time in OpenMP region (sec)

1 0.160

2 0.083

4 0.046

8 0.041