<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: QueryPerformanceCounter and OpenMP in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825661#M49856</link>
    <description>I believe I tracked down the problem--I forgot that all integers in Fortran are signed and that the 32nd bit of the lower 32 bits is the sign bit (I was using the LARGE_INTEGER type defined in ifwinty because the kernel32 module uses that type).  On a related topic, is ETIME() a reliable method for getting elapsed processor time on multicore system?
&lt;BR /&gt;
By the way, below is the working version of the code that uses LARGE_INTEGER--it probably would be easier to skip kernel32 and pass a INTEGER(KIND=8) to QueryPerformanceCounter.
&lt;BR /&gt;
&lt;PRE&gt;
FUNCTION read_timer()
  USE kinds
  USE kernel32, ONLY: QueryPerformanceCounter,QueryPerformanceFrequency
  USE ifwinty

  INTEGER(i8b) :: read_timer
  TYPE(T_LARGE_INTEGER) :: freq, time_hack
  INTEGER(i8b) :: timer_freq
  INTEGER(BOOL) :: rc

  ! Always get the frequency because it can change
  rc = QueryPerformanceFrequency(freq)
  rc = QueryPerformanceCounter(time_hack)

  ! The LARGE_INTEGER type provides storage for a signed 64-bit integer and
  ! it is constructed using two 32 bit integers.  To convert the
  ! LARGE_INTEGER type into one 64 bit in a portable fashion
  ! we need to do the following:
  ! 1) Multiply HighPart of LARGE_INTEGER by 2 ^ 32, which  shifts it to the 
  !    left by 32 bits.
  read_timer = time_hack%HighPart * 4294967296_i8b

  ! 2) Add the lower 31 bits of LowPart to the sum by masking out the 
  !    sign bit (AND 0x7FFFFFFF).  we need to ignore bit 32 because 
  !    Fortran thinks it is the sign bit (all Integers are signed in Fortran).
  read_timer = read_timer + IAND(time_hack%LowPart,Z'7FFFFFFF')

  ! 3) Handle the sign bit of LowPart by checking to see if it is set.
  !    If it is, add 2^31 to the sum
  IF(BTEST(time_hack%LowPart,31)) &amp;amp;
       read_timer = read_timer + 2147483648_i8b

  timer_freq = freq%HighPart * 4294967296_i8b
  timer_freq = timer_freq + IAND(freq%LowPart,Z'7FFFFFFF')
  IF(BTEST(freq%LowPart,31)) &amp;amp;
       timer_freq = timer_freq + 2147483648_i8b

  ! Convert the timer ticks into microseconds (hence the 1000000)
  read_timer = read_timer / (timer_freq / 1000000_i8b)
END FUNCTION read_timer
&lt;/PRE&gt;</description>
    <pubDate>Mon, 26 Feb 2007 06:00:53 GMT</pubDate>
    <dc:creator>Dishaw__Jim</dc:creator>
    <dc:date>2007-02-26T06:00:53Z</dc:date>
    <item>
      <title>QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825656#M49851</link>
      <description>Based on Microsoft's documentation, I thought QueryPerformanceCounter should work in a multiprocessor environment.  When I have OpenMP enabled, I can't get a consistent time, e.g. I compute a negative elapsed time.
&lt;BR /&gt;
Has anyone used QueryPerformanceCounter with OpenMP enabled?  Any suggestions?</description>
      <pubDate>Sun, 25 Feb 2007 06:51:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825656#M49851</guid>
      <dc:creator>Dishaw__Jim</dc:creator>
      <dc:date>2007-02-25T06:51:57Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825657#M49852</link>
      <description>&lt;P&gt;James D,&lt;/P&gt;
&lt;P&gt;I have used QueryPerformanceCounter with no problems.&lt;/P&gt;
&lt;P&gt;QueryPerformanceCounter return integer(8) variables. If your function that computes elapse time based on a snapshot of the start time and a snapshot of the end time uses less precision then you may have a problem.&lt;/P&gt;
&lt;P&gt;Bad coding would convert the counts to reals first then compute elapse time. Good coding would produce the delta time based on the difference of the integer(8) counts. Then convert the difference (elapse time) into REAL(8), convert the ticks per second to REAL(8), then produce the runtime in real(8) as elapseTicks / TicksPerSecond.&lt;/P&gt;
&lt;P&gt;If you still have problems then check the intermediary values using the debugger. &lt;/P&gt;
&lt;P&gt;Also, consider using the OMP_GET_WTIME() function which is platform independent and essentially does what you want. On Windows it calls QueryPerformanceCounter at some point. If you are timing very short term intervals then consider using QueryPerformanceCounter, othewise use the OpenMP library function.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;
&lt;P&gt;(from one James D to another)&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 25 Feb 2007 19:38:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825657#M49852</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-02-25T19:38:16Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825658#M49853</link>
      <description>&lt;P&gt;I forgot to mention. On multi-processor platforms attempts are made to keep the performance counters synchronized amongst the processors. The synchronizations can drift depending on configuration issues. An example of which might be the processor clock speed being altered for thermal considerations. Consider using code to assign the threadsto run on a specific processor. This is called set processor affinity. If the threads do not move around then the synchronization of the performance countersis not an issue. Note, if you are timing all threads instead of each thread then only the timming thread need be locked such that it observes one performance counter. Also note, that if a synchronization occures and if the timing processor is affected then the timing data is less accurate.Therefore make several runs, throw out the best and worst times and average the rest. &lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Sun, 25 Feb 2007 19:50:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825658#M49853</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-02-25T19:50:31Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825659#M49854</link>
      <description>&lt;P&gt;__rdtsc() may give better resolution on shorter time intervals, subject to the same precautions which Jim has enumerated.&lt;/P&gt;
&lt;P&gt;Both __rdtsc() and QueryPerformance counters may fail when the rate of the underlying clock varies (e.g. for power saving). Current Intel platforms (beginning with Nocona) avoid this problem, since __rdtsc() actually is based on front side bus clock, even though it appears to count CPU clock ticks. It should be possible to measure elapsed time intervals from 1e-7 secondsto several hours.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 25 Feb 2007 21:08:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825659#M49854</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2007-02-25T21:08:17Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825660#M49855</link>
      <description>&lt;P&gt;Thanks for the additional input Tim. &lt;/P&gt;
&lt;P&gt;The O/S has to enable __rdtsc() from Ring 3 in order to make it available to a user application without the requirement of causing a Trap to the O/S. I do not know if this is default behavior for each O/S on which your application runs.&lt;/P&gt;
&lt;P&gt;Say Tim, would you know if anything attached to the FSB can request a longer clock cycle. e.g. if you have weird memory, ECC error, or FSB device wit unusual timing requirements?&lt;/P&gt;
&lt;P&gt;For multiple cores in a package it is expected that they will share the same FSB. And I think for now, Intel multi-socket SMP systems share one FSB. But this may not hold true for much longer. Once multiple FSB's are employed then you fall into the synchronization problem again. A motherboard could be designed to have a master clock for all FSB's, provided that nothing on the bus can stretch a clock cycle for one of the busses but not the other(s).&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 26 Feb 2007 05:44:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825660#M49855</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-02-26T05:44:01Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825661#M49856</link>
      <description>I believe I tracked down the problem--I forgot that all integers in Fortran are signed and that the 32nd bit of the lower 32 bits is the sign bit (I was using the LARGE_INTEGER type defined in ifwinty because the kernel32 module uses that type).  On a related topic, is ETIME() a reliable method for getting elapsed processor time on multicore system?
&lt;BR /&gt;
By the way, below is the working version of the code that uses LARGE_INTEGER--it probably would be easier to skip kernel32 and pass a INTEGER(KIND=8) to QueryPerformanceCounter.
&lt;BR /&gt;
&lt;PRE&gt;
FUNCTION read_timer()
  USE kinds
  USE kernel32, ONLY: QueryPerformanceCounter,QueryPerformanceFrequency
  USE ifwinty

  INTEGER(i8b) :: read_timer
  TYPE(T_LARGE_INTEGER) :: freq, time_hack
  INTEGER(i8b) :: timer_freq
  INTEGER(BOOL) :: rc

  ! Always get the frequency because it can change
  rc = QueryPerformanceFrequency(freq)
  rc = QueryPerformanceCounter(time_hack)

  ! The LARGE_INTEGER type provides storage for a signed 64-bit integer and
  ! it is constructed using two 32 bit integers.  To convert the
  ! LARGE_INTEGER type into one 64 bit in a portable fashion
  ! we need to do the following:
  ! 1) Multiply HighPart of LARGE_INTEGER by 2 ^ 32, which  shifts it to the 
  !    left by 32 bits.
  read_timer = time_hack%HighPart * 4294967296_i8b

  ! 2) Add the lower 31 bits of LowPart to the sum by masking out the 
  !    sign bit (AND 0x7FFFFFFF).  we need to ignore bit 32 because 
  !    Fortran thinks it is the sign bit (all Integers are signed in Fortran).
  read_timer = read_timer + IAND(time_hack%LowPart,Z'7FFFFFFF')

  ! 3) Handle the sign bit of LowPart by checking to see if it is set.
  !    If it is, add 2^31 to the sum
  IF(BTEST(time_hack%LowPart,31)) &amp;amp;
       read_timer = read_timer + 2147483648_i8b

  timer_freq = freq%HighPart * 4294967296_i8b
  timer_freq = timer_freq + IAND(freq%LowPart,Z'7FFFFFFF')
  IF(BTEST(freq%LowPart,31)) &amp;amp;
       timer_freq = timer_freq + 2147483648_i8b

  ! Convert the timer ticks into microseconds (hence the 1000000)
  read_timer = read_timer / (timer_freq / 1000000_i8b)
END FUNCTION read_timer
&lt;/PRE&gt;</description>
      <pubDate>Mon, 26 Feb 2007 06:00:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825661#M49856</guid>
      <dc:creator>Dishaw__Jim</dc:creator>
      <dc:date>2007-02-26T06:00:53Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825662#M49857</link>
      <description>&lt;P&gt;ETIME, generally speaking, is made available only for legacy compatibility. On the most common compilers, over the last 10 years, it duplicates the functionality of CPU_TIME. So, it usually attempts to report CPU time, not elapsed time. The resolution, at best, would be the same as CPU_TIME. For my own use, I write a function based on __rdtsc() which has the same calling data types as CPU_TIME(), so it is easy to switch.&lt;/P&gt;
&lt;P&gt;Jim's recommendation to treat the 64-bit integers as plain 8-byte integers avoids the complication of treating them as pairs of 32-bit integers. Why use a compiler, if you aren't willing to let it do the work? It would be a long time before you would have to worry about signed vs unsigned 64-bit integers, except that the generated code for signed integers is likely to be more efficient. As Jim suggested, taking the required differences of 64-bit integers, then using double precision code for further calculations, gives you reasonable efficiency.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Feb 2007 14:08:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825662#M49857</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2007-02-26T14:08:20Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825663#M49858</link>
      <description>&lt;P&gt;Examine this code:&lt;/P&gt;&lt;PRE&gt;&lt;FONT color="#008000" size="2"&gt;&lt;P&gt;! PerformanceCounter.f90&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;module&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounter&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;use&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; kernel32&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;! Performance counter information&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; T_LARGE_INTEGER_OVERLAY&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;union&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;map&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(T_LARGE_INTEGER) :: li&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;end map&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;map&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;integer(8)&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; :: i8 = 0&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;end map&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;end union&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;end&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; &lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; T_LARGE_INTEGER_OVERLAY&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(T_LARGE_INTEGER_OVERLAY) :: PerformanceCounterFrequency_LARGE_INTEGER&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;real(8)&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; :: PerformanceCounterFrequency_real8&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; T_PERFORMANCECOUNTER&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(T_LARGE_INTEGER_OVERLAY) :: CountStart&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(T_LARGE_INTEGER_OVERLAY) :: CountEnd&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;real(8)&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; :: RunTimeInSeconds = 0.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;end&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; &lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; T_PERFORMANCECOUNTER&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;contains&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;&lt;P&gt;! PerformanceCounterInit&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;! Call once at program initialization&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;! Determine the Performance Counter Frequency&lt;/P&gt;&lt;P&gt;! This assumes all processors use the same frequency&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounterInit&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;integer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(BOOL) :: bTrash&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;! Get tick frequency as T_LARGE_INTEGER&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;bTrash = QueryPerformanceFrequency(PerformanceCounterFrequency_LARGE_INTEGER.li)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;! Convert to real(8)&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;PerformanceCounterFrequency_real8 = &lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" size="2"&gt;dble&lt;/FONT&gt;&lt;FONT size="2"&gt;(PerformanceCounterFrequency_LARGE_INTEGER.i8)&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;end subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounterInit&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounterStart(PerformanceCounter)&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(T_PERFORMANCECOUNTER) :: PerformanceCounter&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;integer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;
&lt;FONT size="2"&gt;(BOOL) :: bTrash&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;! Reset RunTimeInSeconds to 0.&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;PerformanceCounter.RunTimeInSeconds = 0.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;! Read Performance Counter into PerformanceCountStart&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;bTrash = QueryPerformanceCounter(PerformanceCounter.CountStart.li)&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;end subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounterStart&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounterResume(PerformanceCounter)&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(T_PERFORMANCECOUNTER) :: PerformanceCounter&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;integer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(BOOL) :: bTrash&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;! Read Performance Counter into PerformanceCountStart&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;bTrash = QueryPerformanceCounter(PerformanceCounter.CountStart.li)&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;end subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounterResume&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounterEnd(PerformanceCounter)&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;type&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(T_PERFORMANCECOUNTER) :: PerformanceCounter&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;integer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(BOOL) :: bTrash&lt;P&gt;&lt;/P&gt;&lt;P&gt;bTrash = QueryPerformanceCounter(PerformanceCounter.CountEnd.li)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;! compute and accumulate run time in seconds&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;PerformanceCounter.RunTimeInSeconds = PerformanceCounter.RunTimeInSeconds &amp;amp;&lt;/P&gt;&lt;P&gt;&amp;amp; + (&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" size="2"&gt;dble&lt;/FONT&gt;&lt;FONT size="2"&gt;(PerformanceCounter.CountEnd.i8 - PerformanceCounter.CountStart.i8) &amp;amp;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;amp; / PerformanceCounterFrequency_real8)&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;end subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounterEnd&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;end module&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; PerformanceCounter&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;---&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;You may notice that the PerformanceCounterStart&lt;BR /&gt;function zeros out what would ordinarily be the&lt;BR /&gt;Elapse time. The purpose of doing it this way&lt;BR /&gt;is to provide for PerformanceCounterResume&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;The functions provide for you to pause counting&lt;BR /&gt;time through a section of code that you do not&lt;BR /&gt;wish to be included in the performance calculation.&lt;BR /&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;An example would be if you wanted to exclude the I/O&lt;BR /&gt;time from the computational time.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;Jim Dempsey&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 26 Feb 2007 15:20:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825663#M49858</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-02-26T15:20:38Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825664#M49859</link>
      <description>Consider as an alternative using TRANSFER to "cast" the LARGE_INTEGER type to an INTEGER(8).</description>
      <pubDate>Mon, 26 Feb 2007 15:36:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825664#M49859</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2007-02-26T15:36:58Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825665#M49860</link>
      <description>&lt;P&gt;I considered the UNION approach, however, I was not sure how portable it is between compilers (IIRC it is not in the language specification). &lt;/P&gt;
&lt;P&gt;As for TRANSFER, I must admit I didn't realize that it even existed. The approach I ended up taking was defining an interface to QueryPerformanceCounter where a INTEGER(KIND=8) was passed (all the host platforms I am running on support KIND=8).&lt;/P&gt;
&lt;P&gt;I'm not quite sure how I can coax _rdtsc to give me elapsed cpu time. From my understanding tsc returns wall clock time.&lt;/P&gt;
&lt;P&gt;The reason for this whole endeavour is that my runtime (wall clock)is not scaling with the number of cores at the rate I would expect. My first cut at improving multiprocessor performance was to see what gains would be achieved through the Math Kernel Library (I have many BLAS calls and alinearsystem (a moderate case is a8192x8192 sparse system) that is solved using the Direct Sparse Solver. As I change OMP_NUM_THREADS, the wall clock time stayed constant even though I can see the work being distributed over the processors. I think what this is telling me is that the MKL calls constitute a small portion of the runtime.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Feb 2007 19:31:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825665#M49860</guid>
      <dc:creator>Dishaw__Jim</dc:creator>
      <dc:date>2007-02-26T19:31:32Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825666#M49861</link>
      <description>You should check out &lt;A href="http://www3.intel.com/cd/software/products/asmo-na/eng/threading/286749.htm"&gt;Intel Thread Profiler&lt;/A&gt;. It is designed for just this sort of problem - to see what your threads are actually doing and where time is being wasted.</description>
      <pubDate>Mon, 26 Feb 2007 20:02:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825666#M49861</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2007-02-26T20:02:08Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825667#M49862</link>
      <description>Just put in an order for it. Thanks for the tip</description>
      <pubDate>Mon, 26 Feb 2007 20:19:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825667#M49862</guid>
      <dc:creator>Dishaw__Jim</dc:creator>
      <dc:date>2007-02-26T20:19:15Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825668#M49863</link>
      <description>You're welcome. I had some training on Thread Profiler last year and I was impressed at the kind of information it could tease out of an application,including being able to take you directly to the source code of a call that was causing stalls. What often happens is that you may have multiple threads but a lot of the time is spent waiting for some event to occur making the program effectively serial, or the other threads finished early leaving the main thread to dominate the elapsed time.</description>
      <pubDate>Mon, 26 Feb 2007 20:36:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825668#M49863</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2007-02-26T20:36:01Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825669#M49864</link>
      <description>&lt;P&gt;There are problems and benefits with each method of implementation.&lt;/P&gt;
&lt;P&gt;TRANSFER is a specification of the language whereas UNION is animplementation feature. So TRANSFER will be better for portability issues.&lt;/P&gt;
&lt;P&gt;The disadvantage of TRANSFER is you must be cognizant of the transformation everywhere you use the intrinsic function. For example, just what does the mold mean when you supply 0.0 or 1. Which size real is it? Which size integer is it? &lt;/P&gt;
&lt;P&gt;UNION ties the transformation to the TYPE definition of the structure. Specify the type properly once, then everywhere the transformation (cast) is correct.&lt;/P&gt;
&lt;P&gt;Additionally, by definition in the Microsoft Platform SDK you know&lt;/P&gt;&lt;PRE&gt;&lt;PRE class="syntax"&gt;typedef union _LARGE_INTEGER {&lt;BR /&gt;  struct {&lt;BR /&gt;    DWORD &lt;A class="synParam"&gt;LowPart&lt;/A&gt;;&lt;BR /&gt;    LONG &lt;A class="synParam"&gt;HighPart&lt;/A&gt;;&lt;BR /&gt;  };&lt;BR /&gt;  struct {&lt;BR /&gt;    DWORD &lt;A class="synParam"&gt;LowPart&lt;/A&gt;;&lt;BR /&gt;    LONG &lt;A class="synParam"&gt;HighPart&lt;/A&gt;;&lt;BR /&gt;  } &lt;A class="synParam"&gt;u&lt;/A&gt;;&lt;BR /&gt;  LONGLONG &lt;A class="synParam"&gt;QuadPart&lt;/A&gt;;
} LARGE_INTEGER, &lt;BR /&gt;*PLARGE_INTEGER;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;And the underlaying problem is the interface to the&lt;/PRE&gt;&lt;PRE class="syntax"&gt;Win32 QueryPerformanceCounter is using T_LARGE_INTEGER&lt;BR /&gt;(without the UNION)&lt;BR /&gt;whereas in this case it would be more suitable to&lt;BR /&gt;use T_LONGLONG.&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;Using T_LARGE_INTEGER (without the UNION) is technically&lt;/PRE&gt;&lt;PRE class="syntax"&gt;invalid. Use of DWORD is not representable in FORTRAN as&lt;/PRE&gt;&lt;PRE class="syntax"&gt;FORTRAN does not comprehend the concept of unsigned integers.&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;Therefore, requiring the use of TRANSFER also requires the&lt;BR /&gt;use of an unsupported data type (DWORD).&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;The use of TRANSFER(unknown, known) is no different than&lt;BR /&gt;an obfuscated CAST.&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;In the case of QueryPerformanceCounter the interface&lt;BR /&gt;should be declared to what it does (takes the address&lt;BR /&gt;of an INTEGER(8)) as opposed to taking a pointer to&lt;/PRE&gt;&lt;PRE class="syntax"&gt;a type that is unsuitable for use.&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;Or alternately declare T_LARGE_INTEGER as INTEGER(8).&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;---- (enough of my brow beating) ----&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;Steve, is there anything planned by the standards committee&lt;/PRE&gt;&lt;PRE class="syntax"&gt;to address issues such as unsigned integers and bit fields.&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;It sure would be nice to bring Fortran up to the 1960's.&lt;BR /&gt;&lt;/PRE&gt;&lt;PRE class="syntax"&gt;Jim Dempsey&lt;/PRE&gt;&lt;PRE class="syntax"&gt;&lt;/PRE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 26 Feb 2007 20:48:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825669#M49864</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-02-26T20:48:38Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825670#M49865</link>
      <description>&lt;P&gt;In the particular case here, you know that the source is an 8-byte record that is in fact an integer(8). Given that the use of a Windows API limits portability somewhat, I see no better choice than TRANSFER. It has the advantage of being obviouis what is happening at the point of use, whereas a non-standard UNION does not.&lt;/P&gt;
&lt;P&gt;The standards committee is working on a "bits" feature for F2008. I am not familiar with the details - there is some discussion lately in comp.lang.fortrtan where some observe that it isn't really all that useful. The committee continues to decline to add unsigned types to the language.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Feb 2007 21:36:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825670#M49865</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2007-02-26T21:36:53Z</dc:date>
    </item>
    <item>
      <title>Re: QueryPerformanceCounter and OpenMP</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825671#M49866</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;The standards committee is working on a "bits" feature for F2008. I am not familiar with the details - there is some discussion lately in comp.lang.fortrtan where some observe that it isn't really all that useful. The committee continues to decline to add unsigned types to the language.&lt;/P&gt;
&lt;P&gt;And they must be experiencing a bad case of angst over a one bit field. Which (as signed) would have 0, -1 as the only permitted values. Consider&lt;/P&gt;
&lt;P&gt;IF(aBit .eq. 0) aBit = 1&lt;/P&gt;
&lt;P&gt;Would set -1 into aBit.&lt;/P&gt;
&lt;P&gt;I expect bit fields to be defered another 25 years.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 26 Feb 2007 23:43:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/QueryPerformanceCounter-and-OpenMP/m-p/825671#M49866</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-02-26T23:43:10Z</dc:date>
    </item>
  </channel>
</rss>

