- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yesterday I was playing around with the affinity of our program and finally figured out that the reason why our affinity never got set in the first place, was that we called omp_get_wtime before, which apparently disables any attempt to set the affinity afterwards.
With this knowledge I set out to find a new timer to replace the omp_get_wtime with, and ended up using system_clock for the outer timer
My problem is that sometimes the subsection timers, which are calculated by omp_get_wtime, end up adding up to more than the total runtime!
And this isn't a question of miliseconds or anything like that, as can be seen from the following logfile:
Total runtime : 20447.11 s (timed with system_clock)
Forward and derivative calcs: 91.26 % (timed with omp_get_wtime)
Linear solves : 2.03 % (timed with omp_get_wtime)
Analysis : 0.07 % (timed with omp_get_wtime)
Input/Output : 10.84 % (timed with omp_get_wtime)
Others : -4.21 % (total runtime - runtime of all the other subsections)
So does anyone have any idea what could go wrong here? or just what timers I should try instead.
Thanks in advance
Tue
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I haven't seen this symptom (omp_get_wtime affecteing affinity).
Please present a simplified exemplar that demonstrates the problem. Include compiler switches and environment variable settings.
RE: disparity in runtime percents
Check to see if you are sampling the omp_get_wtime within a parallel region. There is nothing implicitly wrong with this provided you take into consideration of concurrency of threads. This may produce overlapped or gapped timing intervals.
Assuming your (timed with system_clock) time is obtained within your application using an API (as opposed to from a script/batch), then you can also insert calls to omp_get_wtime immediately after the sampling of the system_clock. Then compare the two times. If the two times are relatively the same, then this confirms an oversight of your tallies of times.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following is run on Intel(R) Visual Fortran Compiler XE 15.0.1.148 [Intel(R) 64]
I'm not exactly sure how to include compiler switches or environment variable settings, I tried including the export of my visual studio environment variables, but I wasn't able to upload .vssettings files here. But if you need that let me know and I will figure out a way to get it on the forum tomorrow.
PROGRAM tmp use IFPORT use omp_lib implicit none integer :: i,N,m,j logical(4) :: Success real*8 :: RuntimeBegin,RunTimeEnd real*8, allocatable :: x(:),y(:) ! success = SETENVQQ("KMP_AFFINITY=verbose,granularity=fine,scatter") RuntimeBegin=omp_get_wtime() success = SETENVQQ("KMP_AFFINITY=verbose,granularity=fine,scatter") m=1000000 allocate(x(m)) allocate(y(m)) x=0 y=1 N=4 call omp_set_num_threads(N) !$OMP PARALLEL PRIVATE(i) DEFAULT(SHARED) !$OMP DO SCHEDULE(STATIC) do i=1,1000 do j=1,1000000 y(i)=y(i)+(x(j)*j)**0.99 end do end do !$OMP END DO !$OMP END PARALLEL END PROGRAM tmp
As you can see I can call SETENVQQ both before and after i call omp_get_wtime, if I call it before, then I get all the feedback from verbose back when I run it and the affinity get set just as it should. If I run it after, I get no feedback when I try to set it, and I can see that it fails to spread my parallel regions across the nodes.
In any case thanks for the reply!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's true that calling omp_get_wtime() before trying to set KMP_AFFINITY with SETENVQQ() seems to have no effect on the desired affinity setting. Like Jim I've not seen this before; I'll make inquiries.
Calling DCLOCK() seems to work. Doesn't matter if you call it before or after the call to SETENVQQ("KMP_AFFINITY=verbose..."
#ifdef BEFOREOMP
success = SETENVQQ("KMP_AFFINITY=verbose,granularity=fine,scatter")
#endif
#ifdef OMPWTIME
RuntimeBegin=omp_get_wtime()
#else
RuntimeBegin= DCLOCK()
#endif
#ifndef BEFOREOMP
success = SETENVQQ("KMP_AFFINITY=verbose,granularity=fine,scatter")
#endif
...
!$OMP END DO
#ifdef OMPWTIME
RunTimeEnd=omp_get_wtime()
#else
RuntimeEnd= DCLOCK()
#endif
!$OMP END PARALLEL
#ifdef OMPWTIME
print *,' Used omp_get_wtime()'
#else
print *,' Used DCLOCK()'
#endif
print *,' Total parallel region runime = ',RunTimeEnd-RuntimeBegin
END PROGRAM tmp
***CALL DCLOCK() BEFORE SETTING AFFINITY***
C:\ISN_Forums\U539945>ifort -Qopenmp U539945.f90 -fpp -DBEFOREOMP
Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.1.148 Build 20141023
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
C:\ISN_Forums\U539945>\bin\timethis.exe U539945.exe
TimeThis : Command Line : U539945.exe
TimeThis : Start Time : Thu Jan 29 11:31:30 2015
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3}
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total
cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #242: KMP_AFFINITY: pid 6836 thread 0 bound to OS proc set {0}
OMP: Info #242: KMP_AFFINITY: pid 6836 thread 1 bound to OS proc set {2}
OMP: Info #242: KMP_AFFINITY: pid 6836 thread 2 bound to OS proc set {1}
OMP: Info #242: KMP_AFFINITY: pid 6836 thread 3 bound to OS proc set {3}
Used DCLOCK()
Total parallel region runime = 2.22008895874023
TimeThis : Command Line : U539945.exe
TimeThis : Start Time : Thu Jan 29 11:31:30 2015
TimeThis : End Time : Thu Jan 29 11:31:33 2015
TimeThis : Elapsed Time : 00:00:02.295
***CALL DCLOCK() AFTER SETTING AFFINITY***
C:\ISN_Forums\U539945>ifort -Qopenmp U539945.f90 -fpp
Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.1.148 Build 20141023
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
C:\ISN_Forums\U539945>\bin\timethis.exe U539945.exe
TimeThis : Command Line : U539945.exe
TimeThis : Start Time : Thu Jan 29 11:34:48 2015
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3}
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total
cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #242: KMP_AFFINITY: pid 5596 thread 0 bound to OS proc set {0}
OMP: Info #242: KMP_AFFINITY: pid 5596 thread 1 bound to OS proc set {2}
OMP: Info #242: KMP_AFFINITY: pid 5596 thread 2 bound to OS proc set {1}
OMP: Info #242: KMP_AFFINITY: pid 5596 thread 3 bound to OS proc set {3}
Used DCLOCK()
Total parallel region runime = 2.22508811950684
TimeThis : Command Line : U539945.exe
TimeThis : Start Time : Thu Jan 29 11:34:48 2015
TimeThis : End Time : Thu Jan 29 11:34:51 2015
TimeThis : Elapsed Time : 00:00:02.300
C:\ISN_Forums\U539945>
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to the OpenMP runtime developers, calling omp_get_wtime() does some environmental initialization of the runtime library. Environment initialization is supposed to be done only once, so setting KMP_AFFINITY after that has no effect. IOW, this is the expected behavior.
Incidently, DLOCK() will accumulate wallclock on a per-thread basis in parallel regions. I find that useful to checking thread imbalance, among other things.
Here's an example of a classic prime number counting program using DCLOCK() showing thread imbalance. It hasn't been tuned to improve the balance by tweaking loop scheduling, or with other mechanisms.
C:\OpenMP>\bin\TIMETHIS.EXE ompPrime1_dclock.exe
TimeThis : Command Line : ompPrime1_dclock.exe
TimeThis : Start Time : Thu Jan 29 12:19:01 2015
Range to check for Primes: 1 100000000
We are using 4 thread(s)
processor time for thread 0 = 18.3074989318848
processor time for thread 1 = 27.9699993133545
processor time for thread 2 = 33.5024986267090
processor time for thread 3 = 37.0774993896484
Number of primes found: 5761455
Number of 4n+1 primes found: 2880505
Number of 4n-1 primes found: 2880950
program wallclock time (DCLOCK()) = 37.0799999237061
utilization for thread 0 = 49.3729745672960 %
utilization for thread 1 = 75.4314977640349 %
utilization for thread 2 = 90.3519382298868 %
utilization for thread 3 = 99.9932563806290 %
TimeThis : Command Line : ompPrime1_dclock.exe
TimeThis : Start Time : Thu Jan 29 12:19:01 2015
TimeThis : End Time : Thu Jan 29 12:19:38 2015
TimeThis : Elapsed Time : 00:00:37.132
C:\OpenMP>
Let us know if you have further questions.
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No further questions, but thanks a lot Patrick!
I will implement dclock immediately in my code.
Cheers
Tue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>According to the OpenMP runtime developers, calling omp_get_wtime() does some environmental initialization of the runtime library. Environment initialization is supposed to be done only once, so setting KMP_AFFINITY after that has no effect. IOW, this is the expected behavior.
Patrick,
Can you make sure this is strongly documented with KMP_AFFINITY in a block: When using SETENVQQ("KMP_AFFINITY=... programically to set the environment variables this must be performed prior to the first parallel region or making any calls to omp_...runtime library routines.
Something to that effect.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, we certainly need to document this behavior, I'll take the initiative to do that.
I suspect there may a larger set of standard OpenMP functions besides omp_get_time() that will produce the same effect. It would be helpful to know the complete set. I've already discovered that some of the Intel-specific KMP* functions like KMP_SET_DEFAULTS() and KMP_SET_STACKSIZE_S() produce this effect as well.
For example, this has no effect on the desired KMP_AFFINITY() settings:
call KMP_SET_STACKSIZE_S(1000000)
Lsuccess = SETENVQQ("KMP_AFFINITY=verbose,scatter")
ISuccess =GETENVQQ("KMP_AFFINITY", val)
print *,Val
!$OMP PARALLEL
!$OMP END PARALLEL
My new BKM is to set all the environment and runtime settings in one shot, before entering the first parallel region. For example:
call KMP_SET_DEFAULTS("KMP_AFFINITY=verbose,scatter|KMP_STACKSIZE=1000000|KMP_SETTINGS=1")
!$OMP PARALLEL
< parallel work>
!$OMP END PARALLEL
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I filed a report with the developers to document all of the OpenMP runtime library routines that initialize the runtime environment. For all I know, that's all of them, so if you are setting the environment programmatically, I suggest you do that before calling any such routine.
I know for a fact that calling any of these will initialize the runtime environment: omp_get_wtime(), omp_get_wtick(), omp_set_num_threads(), KMP_SET_STACKSIZE_S(), KMP_SET_DEFAULTS().
Internal tracking ID: DPD200365853
Patrick

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page