Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Fortran Parallel Programming‎

houidef_a_
Beginner
732 Views

Is evident that the use of parallelism leads to minimizing the time of program but this not happened me, when I programed a code parallel, the time of machine is augmented (PARALLEL TIMES >>>NO PARALLEL TIMES )???  like this :

    PROGRAM MAIN
    use omp_lib
    implicit none
    REAL*8 Times1,Times2
    INTEGER I,J
    real, allocatable, dimension(:) :: a
    allocate(a(1000))
    DO J = 1, 1000
    a(j)=j  
    ENDDO
!    ***************NO PARALLEL CODE ************************************
    call CPU_TIME(Times1)
    write(*,*) 'CPU NO PARALLEL STARTED:',Times1
    DO I = 1, 1000
    DO J = 1, 500000
    a(I)=a(I)+0.0001
    end do 
    a(I)=a(I)+a(I)+a(I)
    ENDDO
    call CPU_TIME(Times2)
    write(*,*) 'CPU CPU NO PARALLEL finished:',Times2
    write(*,*) 'NO PARALLEL TIMES:',Times2-Times1
    write(*,*) '---------------------------------------------------'
!    ***************PARALLEL CODE ************************************
    call CPU_TIME(Times1)
    write(*,*) 'CPU PARALLEL STARTED:',Times1
!$OMP PARALLEL DEFAULT(shared) 
!$OMP DO
    DO I = 1, 1000
    DO J = 1, 500000
    a(I)=a(I)+0.0001
    end do 
    a(I)=a(I)+a(I)+a(I)
    ENDDO
!$OMP END DO
!$OMP END PARALLEL
    call CPU_TIME(Times2)
    write(*,*) 'CPU PARALLEL finished:',Times2
    write(*,*) 'PARALLEL TIMES:',Times2-Times1
    deallocate(a)
    STOP
    END

and the result :

 CPU NO PARALLEL STARTED:  1.560010000000000E-002
 CPU CPU NO PARALLEL finished:   4.86723120000000
 NO PARALLEL TIMES:   4.85163110000000
 ---------------------------------------------------
 CPU PARALLEL STARTED:   4.86723120000000
 CPU PARALLEL finished:   9.89046340000000
 PARALLEL TIMES:   5.02323220000000

PARALLEL TIMES >>>NO PARALLEL TIMES  ???????????????????!!!!!!!!!!!

0 Kudos
4 Replies
TimP
Honored Contributor III
732 Views

 

Cpu time (total of all threads) normally increases with parallelism even when implemented effectively e.g. measured by omp_get_wtime().  when you invite the compiler to skip operations it may skip more in the non parallel.

 

0 Kudos
mecej4
Honored Contributor III
732 Views

Here is a slightly modified version of the program above. The modifications are (i) to prevent the optimizer from doing away with so much of the calculations as to make the "benchmark" trivial and meangless, as Tim Prince remarked. I added a statement at the end of the outer loop to calculate and print  the sum of the elements of array A, and (ii) to use the OpenMP timing routine omp_get_wtime() instead of cpu_time(). The results are quite reasonable, on my system with a quad-core CPU with 1, 2, 4 and 8 threads.

Threads    Serial time    Parallel time

1                0.160            0.159

2                0.160            0.082

4                0.160            0.042

8                0.160            0.042

    PROGRAM MAIN
    use omp_lib
    implicit none
    REAL*8 Times1,Times2
    INTEGER I,J
    real, allocatable, dimension(:) :: a
    real s1,s2
!
    allocate(a(1000))
    DO J = 1, 1000
    a(j)=j
    ENDDO
!    ***************SERIAL CODE ************************************
    Times1=omp_get_wtime()
    write(*,'(A18,2x,F16.3)') 'SERIAL started:',Times1
    DO I = 1, 1000
    DO J = 1, 500000
    a(I)=a(I)+0.0001
    end do
    a(I)=a(I)+a(I)+a(I)
    ENDDO
    Times2=omp_get_wtime()
    s1=sum(a)
    write(*,'(A18,2x,F16.3)') 'SERIAL finished:',Times2
    write(*,'(A18,2x,F6.3,2x,ES12.3E1)') 'SERIAL TIME:',Times2-Times1,s1
    write(*,*) '---------------------------------------------------'
!    ***************PARALLEL CODE ************************************
    DO J = 1, 1000
       a(j)=j
    ENDDO
    Times1=omp_get_wtime()
    write(*,'(A18,2x,F16.3)') 'PARALLEL started:',Times1
!$OMP PARALLEL DEFAULT(shared)
!$OMP DO
    DO I = 1, 1000
    DO J = 1, 500000
    a(I)=a(I)+0.0001
    end do
    a(I)=a(I)+a(I)+a(I)
    ENDDO
!$OMP END DO
!$OMP END PARALLEL
    s2=sum(a)
    Times2=omp_get_wtime()
    write(*,'(A18,2x,F16.3)') 'PARALLEL finished:',Times2
    write(*,'(A18,2x,F6.3,2x,ES12.3E1)') 'PARALLEL TIME:',Times2-Times1,s2
    deallocate(a)
    STOP
    END

 

 

0 Kudos
houidef_a_
Beginner
732 Views

but problem is :  omp_get_wtime() function is for system  clock not for CPU times !!!!

0 Kudos
mecej4
Honored Contributor III
732 Views

I used a single-user system with no heavy background tasks in reporting the timings in #3.

If you are concerned about how accurate the different timing routines provided by IFort are, you may consider using the thread profiler features of Intel Vtune, if you have that product installed. After removing all timing calls, on Windows 8.1 X64 using the system mentioned in #3, I obtained the following results for the "OpenMP region", i.e., the portion of the program in the !$OMP PARALLEL block. These results differ little from the times reported by omp_get_wtime(). Note that those timing calls were all located outside the parallel region.

n_threads    Time in OpenMP region (sec)

1                   0.160

2                   0.083

4                   0.046

8                   0.041

0 Kudos
Reply