<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic gprof vs difftime in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/gprof-vs-difftime/m-p/946911#M5106</link>
    <description>Hi;&lt;BR /&gt;&lt;BR /&gt;I have been compiling a fairly large Fortran application&lt;BR /&gt;with Intel v8.1 compiler using OMP directives for parallelism on &lt;BR /&gt;dual socket IA-32 machines. &lt;BR /&gt;&lt;BR /&gt;All subroutines (a few are in ANSI C) have been instrumented&lt;BR /&gt;with -pg compile switch so that I can run gprof later to check&lt;BR /&gt;on performance.  According to the "flat profile" in the gprof&lt;BR /&gt;output, the code runs twice as fast when using two processors&lt;BR /&gt;compared to with a single processor.&lt;BR /&gt;&lt;BR /&gt;However, the code also calls difftime (from C) to give me the&lt;BR /&gt;elapsed execution time for the software.  Comparing the elapsed&lt;BR /&gt;times gives me only a 30% improvement in speed.  According to&lt;BR /&gt;gprof, I should expect a 50% improvement.&lt;BR /&gt;&lt;BR /&gt;I don't understand why the two methods of clocking the &lt;BR /&gt;parallelised software are so different. &lt;BR /&gt;&lt;BR /&gt;It does not seem likely that it is due to OMP overhead, because&lt;BR /&gt;gprof measures the time spent in each subroutine.  The overhead&lt;BR /&gt;should be included in the gprof profile stats, unless I am&lt;BR /&gt;misinterpreting the gprof method.  Same thoughts about memory access.  About 10% of the computer run seems to be spent on&lt;BR /&gt;Fortran i/o, based on the output of the "top" utility (procs are "idle" or running "system calls"), but again, this should be included in the gprof stats, since the io is called from inside the profiled subroutines. &lt;BR /&gt;&lt;BR /&gt;I would be grateful if someone could comment on what might&lt;BR /&gt;be causing the difference in clocking methods.&lt;BR /&gt;&lt;BR /&gt;Thanks.&lt;BR /&gt;JR</description>
    <pubDate>Wed, 08 Feb 2006 22:30:26 GMT</pubDate>
    <dc:creator>johnrayner</dc:creator>
    <dc:date>2006-02-08T22:30:26Z</dc:date>
    <item>
      <title>gprof vs difftime</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/gprof-vs-difftime/m-p/946911#M5106</link>
      <description>Hi;&lt;BR /&gt;&lt;BR /&gt;I have been compiling a fairly large Fortran application&lt;BR /&gt;with Intel v8.1 compiler using OMP directives for parallelism on &lt;BR /&gt;dual socket IA-32 machines. &lt;BR /&gt;&lt;BR /&gt;All subroutines (a few are in ANSI C) have been instrumented&lt;BR /&gt;with -pg compile switch so that I can run gprof later to check&lt;BR /&gt;on performance.  According to the "flat profile" in the gprof&lt;BR /&gt;output, the code runs twice as fast when using two processors&lt;BR /&gt;compared to with a single processor.&lt;BR /&gt;&lt;BR /&gt;However, the code also calls difftime (from C) to give me the&lt;BR /&gt;elapsed execution time for the software.  Comparing the elapsed&lt;BR /&gt;times gives me only a 30% improvement in speed.  According to&lt;BR /&gt;gprof, I should expect a 50% improvement.&lt;BR /&gt;&lt;BR /&gt;I don't understand why the two methods of clocking the &lt;BR /&gt;parallelised software are so different. &lt;BR /&gt;&lt;BR /&gt;It does not seem likely that it is due to OMP overhead, because&lt;BR /&gt;gprof measures the time spent in each subroutine.  The overhead&lt;BR /&gt;should be included in the gprof profile stats, unless I am&lt;BR /&gt;misinterpreting the gprof method.  Same thoughts about memory access.  About 10% of the computer run seems to be spent on&lt;BR /&gt;Fortran i/o, based on the output of the "top" utility (procs are "idle" or running "system calls"), but again, this should be included in the gprof stats, since the io is called from inside the profiled subroutines. &lt;BR /&gt;&lt;BR /&gt;I would be grateful if someone could comment on what might&lt;BR /&gt;be causing the difference in clocking methods.&lt;BR /&gt;&lt;BR /&gt;Thanks.&lt;BR /&gt;JR</description>
      <pubDate>Wed, 08 Feb 2006 22:30:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/gprof-vs-difftime/m-p/946911#M5106</guid>
      <dc:creator>johnrayner</dc:creator>
      <dc:date>2006-02-08T22:30:26Z</dc:date>
    </item>
    <item>
      <title>Re: gprof vs difftime</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/gprof-vs-difftime/m-p/946912#M5107</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;gprof attempts to measure the CPU time spent in each reported subroutine. In the call graph profile, it attempts to show time spent in a function, plus those functions called by it which are instrumented for gprof. Thus, the function calls inserted by OpenMP, and Fortran run-time library functions,would show up only as separate entries, if you are lucky.</description>
      <pubDate>Thu, 09 Feb 2006 10:53:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/gprof-vs-difftime/m-p/946912#M5107</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2006-02-09T10:53:15Z</dc:date>
    </item>
  </channel>
</rss>

