Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28898 Discussions

/Qopenmp option with cpu_time routine for time calculation

Fortran10
Novice
1,129 Views

 

program time_check
  use omp_lib
  implicit none

  real (8),dimension(:,:,:),  allocatable :: A, C
  integer(4) :: n = 300, P=300, i
  real (8) :: t1, t2, t3, t4, dc_time, omp_time


  allocate ( A(n,n,P), C(n,n,P) )
  call random_number ( a )


  call cpu_time ( t1 )
  do concurrent ( i = 1:P ) 
     c(:,:,i) = MATMUL ( A(:,:,i), A(:,:,i) )
  end do
  call cpu_time ( t2 )


  call cpu_time ( t3 )
  !$OMP PARALLEL DO SHARED ( A,C,P )
  do i = 1,P
     c(:,:,i) = MATMUL ( A(:,:,i), A(:,:,i) )
  end do
  !$OMP END PARALLEL DO
  call cpu_time ( t4 )

  dc_time = t2 - t1
  omp_time= t4 - t3

  print*, "Do concurrent time = ", dc_time
  print*, "OpenMP time        = ", omp_time

End program time_check

 

I am using cpu_time routine to time the do concurrent and OpenMP construct. When I use /Qopenmp for compilation the time seems not correct. But without /Qopenmp the timing seems correct. 

Qopenmp.PNGnoopenmp.PNG

I am familiar with the OpenMP wall time routine i.e. 

omp_get_wtime()

Also MPI has it's own function i.e.

MPI_Wtime()

So if

call cpu_time()

is for serial time then do we have any specific routine or function only for do concurrent timing too ?

0 Kudos
1 Solution
Barbara_P_Intel
Employee
1,051 Views

When timing an application that has parallelization, use wall clock time. Why not use omp_get_wtime() since you're using OpenMP directives?

cpu_time() sums the time for all threads. Here's the reference in the DGR.

 

View solution in original post

0 Kudos
8 Replies
Steve_Lionel
Honored Contributor III
1,097 Views

CPU_TIME may add up all the time in threads. I'll also comment that the test program runs the risk of not measuring what you want because the compiler could throw away much of the work since it is not used. I'll also note that for MATMUL, at least, the compiler can call an MKL optimized version that does its own multithreading. That may be an issue here.

0 Kudos
JohnNichols
Valued Contributor III
1,081 Views

You are measuring time for a process that has millions of steps. 

One would fairly assume that your answers are the same, ie you have two values from a random Gaussian process that probably has a wide error.  

You  are hunting a furphy. 

 
 

 

 

0 Kudos
JohnNichols
Valued Contributor III
1,081 Views

Screenshot 2023-10-17 174647.png

Post was cut off

Write a Monte Carlo shell and run it a million times and then repose the problem.  

 

0 Kudos
JohnNichols
Valued Contributor III
1,071 Views

Screenshot 2023-10-17 175311.pngScreenshot 2023-10-17 175357.pngScreenshot 2023-10-17 175425.png

QED. Same computer nothing changed, nothing running that was not running 32 hours ago. 

0 Kudos
JohnNichols
Valued Contributor III
1,064 Views

JohnNichols_0-1697584593609.png

Skewed Gaussian after 68 with a little log and maybe some logistic

For a full MC run with this code and 1000000 requires 7 weeks. 

If you can prove there is a statistical difference then your math is better than mine. 

The XLSX file includes up to about 200 

0 Kudos
Barbara_P_Intel
Employee
1,052 Views

When timing an application that has parallelization, use wall clock time. Why not use omp_get_wtime() since you're using OpenMP directives?

cpu_time() sums the time for all threads. Here's the reference in the DGR.

 

0 Kudos
Fortran10
Novice
958 Views

I had no idea that cpu_time ( ) sums the time for all threads. This  CPU_TIME (intel.com) gives clarity, sadly (CPU_TIME (The GNU Fortran Compiler)) does not! 

 

 

0 Kudos
TobiasK
Moderator
1,031 Views

@Fortran10 just another note about the timers, some time in the past I switched always to system_clock, as I don't see the benefit of calling either the OMP runtime or the MPI runtime. The only drawback of system_clock is it returns ticks and the precision is dependent on the kind of the integer that you use to call system_clock. So just make sure you are using INT64 when calling system_clock both for the count and the count_rate, if you need high precision.
https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-2/system-clock.html

 

0 Kudos
Reply