- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is evident that the use of parallelism leads to minimizing the time of program but this not happened me, when I programed a code parallel, the time of machine is augmented (PARALLEL TIMES >>>NO PARALLEL TIMES )??? like this :
PROGRAM MAIN use omp_lib implicit none REAL*8 Times1,Times2 INTEGER I,J real, allocatable, dimension(:) :: a allocate(a(1000)) DO J = 1, 1000 a(j)=j ENDDO ! ***************NO PARALLEL CODE ************************************ call CPU_TIME(Times1) write(*,*) 'CPU NO PARALLEL STARTED:',Times1 DO I = 1, 1000 DO J = 1, 500000 a(I)=a(I)+0.0001 end do a(I)=a(I)+a(I)+a(I) ENDDO call CPU_TIME(Times2) write(*,*) 'CPU CPU NO PARALLEL finished:',Times2 write(*,*) 'NO PARALLEL TIMES:',Times2-Times1 write(*,*) '---------------------------------------------------' ! ***************PARALLEL CODE ************************************ call CPU_TIME(Times1) write(*,*) 'CPU PARALLEL STARTED:',Times1 !$OMP PARALLEL DEFAULT(shared) !$OMP DO DO I = 1, 1000 DO J = 1, 500000 a(I)=a(I)+0.0001 end do a(I)=a(I)+a(I)+a(I) ENDDO !$OMP END DO !$OMP END PARALLEL call CPU_TIME(Times2) write(*,*) 'CPU PARALLEL finished:',Times2 write(*,*) 'PARALLEL TIMES:',Times2-Times1 deallocate(a) STOP END
and the result :
CPU NO PARALLEL STARTED: 1.560010000000000E-002
CPU CPU NO PARALLEL finished: 4.86723120000000
NO PARALLEL TIMES: 4.85163110000000
---------------------------------------------------
CPU PARALLEL STARTED: 4.86723120000000
CPU PARALLEL finished: 9.89046340000000
PARALLEL TIMES: 5.02323220000000
PARALLEL TIMES >>>NO PARALLEL TIMES ???????????????????!!!!!!!!!!!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cpu time (total of all threads) normally increases with parallelism even when implemented effectively e.g. measured by omp_get_wtime(). when you invite the compiler to skip operations it may skip more in the non parallel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a slightly modified version of the program above. The modifications are (i) to prevent the optimizer from doing away with so much of the calculations as to make the "benchmark" trivial and meangless, as Tim Prince remarked. I added a statement at the end of the outer loop to calculate and print the sum of the elements of array A, and (ii) to use the OpenMP timing routine omp_get_wtime() instead of cpu_time(). The results are quite reasonable, on my system with a quad-core CPU with 1, 2, 4 and 8 threads.
Threads Serial time Parallel time
1 0.160 0.159
2 0.160 0.082
4 0.160 0.042
8 0.160 0.042
PROGRAM MAIN use omp_lib implicit none REAL*8 Times1,Times2 INTEGER I,J real, allocatable, dimension(:) :: a real s1,s2 ! allocate(a(1000)) DO J = 1, 1000 a(j)=j ENDDO ! ***************SERIAL CODE ************************************ Times1=omp_get_wtime() write(*,'(A18,2x,F16.3)') 'SERIAL started:',Times1 DO I = 1, 1000 DO J = 1, 500000 a(I)=a(I)+0.0001 end do a(I)=a(I)+a(I)+a(I) ENDDO Times2=omp_get_wtime() s1=sum(a) write(*,'(A18,2x,F16.3)') 'SERIAL finished:',Times2 write(*,'(A18,2x,F6.3,2x,ES12.3E1)') 'SERIAL TIME:',Times2-Times1,s1 write(*,*) '---------------------------------------------------' ! ***************PARALLEL CODE ************************************ DO J = 1, 1000 a(j)=j ENDDO Times1=omp_get_wtime() write(*,'(A18,2x,F16.3)') 'PARALLEL started:',Times1 !$OMP PARALLEL DEFAULT(shared) !$OMP DO DO I = 1, 1000 DO J = 1, 500000 a(I)=a(I)+0.0001 end do a(I)=a(I)+a(I)+a(I) ENDDO !$OMP END DO !$OMP END PARALLEL s2=sum(a) Times2=omp_get_wtime() write(*,'(A18,2x,F16.3)') 'PARALLEL finished:',Times2 write(*,'(A18,2x,F6.3,2x,ES12.3E1)') 'PARALLEL TIME:',Times2-Times1,s2 deallocate(a) STOP END
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
but problem is : omp_get_wtime() function is for system clock not for CPU times !!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used a single-user system with no heavy background tasks in reporting the timings in #3.
If you are concerned about how accurate the different timing routines provided by IFort are, you may consider using the thread profiler features of Intel Vtune, if you have that product installed. After removing all timing calls, on Windows 8.1 X64 using the system mentioned in #3, I obtained the following results for the "OpenMP region", i.e., the portion of the program in the !$OMP PARALLEL block. These results differ little from the times reported by omp_get_wtime(). Note that those timing calls were all located outside the parallel region.
n_threads Time in OpenMP region (sec)
1 0.160
2 0.083
4 0.046
8 0.041
![](/skins/images/895D6060305DF45A57FACF854B5A8CD1/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page