Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29258 Discussions

Need timer compatible with omp_get_wtime

Tue_B_
Novice
2,400 Views

Yesterday I was playing around with the affinity of our program and finally figured out that the reason why our affinity never got set in the first place, was that we called omp_get_wtime before, which apparently disables any attempt to set the affinity afterwards.

With this knowledge I set out to find a new timer to replace the omp_get_wtime with, and ended up using system_clock for the outer timer

My problem is that sometimes the subsection timers, which are calculated by omp_get_wtime, end up adding up to more than the total runtime!

And this isn't a question of miliseconds or anything like that, as can be seen from the following logfile:

   Total runtime               :  20447.11 s                    (timed with system_clock)
   Forward and derivative calcs:     91.26 %           (timed with omp_get_wtime)
   Linear solves               :      2.03 %                     (timed with omp_get_wtime)
   Analysis                    :      0.07 %                        (timed with omp_get_wtime)
   Input/Output                :     10.84 %                    (timed with omp_get_wtime)
   Others                      :     -4.21 %                        (total runtime - runtime of all the other subsections)

So does anyone have any idea what could go wrong here? or just what timers I should try instead.

Thanks in advance

Tue

 

 

0 Kudos
8 Replies
jimdempseyatthecove
Honored Contributor III
2,400 Views

I haven't seen this symptom (omp_get_wtime affecteing affinity).

Please present a simplified exemplar that demonstrates the problem. Include compiler switches and environment variable settings.

RE: disparity in runtime percents

Check to see if you are sampling the omp_get_wtime within a parallel region. There is nothing implicitly wrong with this provided you take into consideration of concurrency of threads. This may produce overlapped or gapped timing intervals.

Assuming your (timed with system_clock) time is obtained within your application using an API (as opposed to from a script/batch), then you can also insert calls to omp_get_wtime immediately after the sampling of the system_clock. Then compare the two times. If the two times are relatively the same, then this confirms an oversight of your tallies of times.

Jim Dempsey

0 Kudos
Tue_B_
Novice
2,400 Views

The following is run on Intel(R) Visual Fortran Compiler XE 15.0.1.148 [Intel(R) 64]

I'm not exactly sure how to include compiler switches or environment variable settings, I tried including the export of my visual studio environment variables, but I wasn't able to upload .vssettings files here. But if you need that let me know and I will figure out a way to get it on the forum tomorrow.

PROGRAM tmp
   
    use IFPORT
    use omp_lib
    implicit none
  
    integer    :: i,N,m,j
    logical(4) :: Success  
    real*8                       :: RuntimeBegin,RunTimeEnd
    real*8, allocatable :: x(:),y(:)
!    success = SETENVQQ("KMP_AFFINITY=verbose,granularity=fine,scatter") 
    RuntimeBegin=omp_get_wtime()
    success = SETENVQQ("KMP_AFFINITY=verbose,granularity=fine,scatter") 
    m=1000000
    allocate(x(m))
    allocate(y(m))
    x=0
    y=1
    N=4
    call omp_set_num_threads(N)     
!$OMP PARALLEL PRIVATE(i) DEFAULT(SHARED)
!$OMP DO SCHEDULE(STATIC)
    do i=1,1000
        do j=1,1000000
            y(i)=y(i)+(x(j)*j)**0.99
        end do
    end do    
!$OMP END DO
!$OMP END PARALLEL    
    END PROGRAM tmp

 

As you can see I can call SETENVQQ both before and after i call omp_get_wtime, if I call it before, then I get all the feedback from verbose back when I run it and the affinity get set just as it should. If I run it after, I get no feedback when I try to set it, and I can see that it fails to spread my parallel regions across the nodes.

In any case thanks for the reply!

 

0 Kudos
pbkenned1
Employee
2,400 Views

It's true that calling omp_get_wtime() before trying to set KMP_AFFINITY with SETENVQQ() seems to have no effect on the desired affinity setting.  Like Jim I've not seen this before; I'll make inquiries. 

Calling DCLOCK() seems to work.  Doesn't matter if you call it before or after the call to SETENVQQ("KMP_AFFINITY=verbose..."

#ifdef BEFOREOMP
    success = SETENVQQ("KMP_AFFINITY=verbose,granularity=fine,scatter")
#endif
#ifdef OMPWTIME
    RuntimeBegin=omp_get_wtime()
#else
    RuntimeBegin= DCLOCK()
#endif
#ifndef BEFOREOMP
    success = SETENVQQ("KMP_AFFINITY=verbose,granularity=fine,scatter")
#endif

...

!$OMP END DO
#ifdef OMPWTIME
    RunTimeEnd=omp_get_wtime()
#else
    RuntimeEnd= DCLOCK()
#endif

!$OMP END PARALLEL

#ifdef OMPWTIME
    print *,' Used omp_get_wtime()'
#else
    print *,' Used DCLOCK()'
#endif
    print *,' Total parallel region runime = ',RunTimeEnd-RuntimeBegin
    END PROGRAM tmp

***CALL DCLOCK() BEFORE SETTING AFFINITY***

C:\ISN_Forums\U539945>ifort -Qopenmp U539945.f90 -fpp -DBEFOREOMP
Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.1.148 Build 20141023
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

C:\ISN_Forums\U539945>\bin\timethis.exe   U539945.exe

TimeThis :  Command Line :  U539945.exe
TimeThis :    Start Time :  Thu Jan 29 11:31:30 2015

OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3}
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total
 cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #242: KMP_AFFINITY: pid 6836 thread 0 bound to OS proc set {0}
OMP: Info #242: KMP_AFFINITY: pid 6836 thread 1 bound to OS proc set {2}
OMP: Info #242: KMP_AFFINITY: pid 6836 thread 2 bound to OS proc set {1}
OMP: Info #242: KMP_AFFINITY: pid 6836 thread 3 bound to OS proc set {3}
  Used DCLOCK()
  Total parallel region runime =    2.22008895874023

TimeThis :  Command Line :  U539945.exe
TimeThis :    Start Time :  Thu Jan 29 11:31:30 2015
TimeThis :      End Time :  Thu Jan 29 11:31:33 2015
TimeThis :  Elapsed Time :  00:00:02.295

 

***CALL DCLOCK() AFTER SETTING AFFINITY***

C:\ISN_Forums\U539945>ifort -Qopenmp U539945.f90 -fpp
Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.1.148 Build 20141023
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

C:\ISN_Forums\U539945>\bin\timethis.exe   U539945.exe

TimeThis :  Command Line :  U539945.exe
TimeThis :    Start Time :  Thu Jan 29 11:34:48 2015

OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3}
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total
 cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #242: KMP_AFFINITY: pid 5596 thread 0 bound to OS proc set {0}
OMP: Info #242: KMP_AFFINITY: pid 5596 thread 1 bound to OS proc set {2}
OMP: Info #242: KMP_AFFINITY: pid 5596 thread 2 bound to OS proc set {1}
OMP: Info #242: KMP_AFFINITY: pid 5596 thread 3 bound to OS proc set {3}
  Used DCLOCK()
  Total parallel region runime =    2.22508811950684

TimeThis :  Command Line :  U539945.exe
TimeThis :    Start Time :  Thu Jan 29 11:34:48 2015
TimeThis :      End Time :  Thu Jan 29 11:34:51 2015
TimeThis :  Elapsed Time :  00:00:02.300

C:\ISN_Forums\U539945>

Patrick

0 Kudos
pbkenned1
Employee
2,400 Views

According to the OpenMP runtime developers, calling omp_get_wtime() does some environmental initialization of the runtime library. Environment initialization is supposed to be done only once, so setting KMP_AFFINITY after that has no effect.  IOW, this is the expected behavior.
 

Incidently, DLOCK() will accumulate wallclock on a per-thread basis in parallel regions.  I find that useful to checking thread imbalance, among other things.

Here's an example of a classic prime number counting program using DCLOCK() showing thread imbalance.  It hasn't been tuned to improve the balance by tweaking loop scheduling, or with other mechanisms.

C:\OpenMP>\bin\TIMETHIS.EXE ompPrime1_dclock.exe

TimeThis :  Command Line :  ompPrime1_dclock.exe
TimeThis :    Start Time :  Thu Jan 29 12:19:01 2015

  Range to check for Primes:           1   100000000
  We are using            4 thread(s)
 processor time for thread           0  =   18.3074989318848
 processor time for thread           1  =   27.9699993133545
 processor time for thread           2  =   33.5024986267090
 processor time for thread           3  =   37.0774993896484
  Number of primes found:     5761455
  Number of 4n+1 primes found:     2880505
  Number of 4n-1 primes found:     2880950
 program wallclock time (DCLOCK()) =   37.0799999237061
  utilization for thread           0  =   49.3729745672960      %
  utilization for thread           1  =   75.4314977640349      %
  utilization for thread           2  =   90.3519382298868      %
  utilization for thread           3  =   99.9932563806290      %

TimeThis :  Command Line :  ompPrime1_dclock.exe
TimeThis :    Start Time :  Thu Jan 29 12:19:01 2015
TimeThis :      End Time :  Thu Jan 29 12:19:38 2015
TimeThis :  Elapsed Time :  00:00:37.132

C:\OpenMP>

Let us know if you have further questions.

Patrick

 

 

 

0 Kudos
Tue_B_
Novice
2,400 Views

No further questions, but thanks a lot Patrick!

I will implement dclock immediately in my code.

 

Cheers

Tue

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,400 Views

>>According to the OpenMP runtime developers, calling omp_get_wtime() does some environmental initialization of the runtime library. Environment initialization is supposed to be done only once, so setting KMP_AFFINITY after that has no effect.  IOW, this is the expected behavior.

Patrick,

Can you make sure this is strongly documented with KMP_AFFINITY in a block: When using  SETENVQQ("KMP_AFFINITY=... programically to set the environment variables this must be performed prior to the first parallel region or making any calls to omp_...runtime library routines.

Something to that effect.

Jim Dempsey

0 Kudos
pbkenned1
Employee
2,400 Views

Yes, we certainly need to document this behavior, I'll take the initiative to do that. 

I suspect there may a larger set of standard OpenMP functions besides omp_get_time() that will produce the same effect.  It would be helpful to know the complete set.  I've already discovered that some of the Intel-specific KMP* functions like KMP_SET_DEFAULTS() and KMP_SET_STACKSIZE_S() produce this effect as well. 

For example, this has no effect on the desired KMP_AFFINITY() settings:

call KMP_SET_STACKSIZE_S(1000000)
Lsuccess = SETENVQQ("KMP_AFFINITY=verbose,scatter")
ISuccess =GETENVQQ("KMP_AFFINITY", val)
print *,Val
!$OMP PARALLEL
!$OMP END PARALLEL

 

My new BKM is to set all the environment and runtime settings in one shot, before entering the first parallel region.  For example:

call KMP_SET_DEFAULTS("KMP_AFFINITY=verbose,scatter|KMP_STACKSIZE=1000000|KMP_SETTINGS=1")
!$OMP PARALLEL
< parallel work>
!$OMP END PARALLEL

 

Patrick
 

0 Kudos
pbkenned1
Employee
2,400 Views

I filed a report with the developers to document all of the OpenMP runtime library routines that initialize the runtime environment.  For all I know, that's all of them, so if you are setting the environment programmatically, I suggest you do that before calling any such routine. 

I know for a fact that calling any of these will initialize the runtime environment:  omp_get_wtime(), omp_get_wtick(), omp_set_num_threads(), KMP_SET_STACKSIZE_S(), KMP_SET_DEFAULTS().

Internal tracking ID:  DPD200365853

Patrick

0 Kudos
Reply