<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic running time in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741592#M1049</link>
    <description>Hello,&lt;BR /&gt;I am working on two different Mac machines and the same code takes twice or three times more to run the same code in a newer machine and newer version of the complier!!!&lt;BR /&gt;In both I use the same instructions. The complete (and very long, sorry) options are:&lt;BR /&gt;macartney:kappa debora$ ifort -Wl,-stack_size,0x10000000 -O1 -shared-intel -o nameoftheprogram.f -L/Library/Frameworks/Intel_MKL.framework/Libraries/&lt;BR /&gt;em64t-I/Library/Frameworks/Intel_MKL.framework/Headers /Library/Frameworks/Inte&lt;BR /&gt;l_MKL.framework/Libraries/em64t/libmkl_intel_lp64.a /Library/Frameworks/Intel_M&lt;BR /&gt;KL.framework/Libraries/em64t/libmkl_intel_thread.a /Library/Frameworks/Intel_MK&lt;BR /&gt;L.framework/Libraries/em64t/libmkl_core.a /Library/Frameworks/Intel_MKL.framewo&lt;BR /&gt;rk/Libraries/em64t/libmkl_intel_lp64.a /Library/Frameworks/Intel_MKL.framework/&lt;BR /&gt;Libraries/em64t/libmkl_intel_thread.a /Library/Frameworks/Intel_MKL.framework/L&lt;BR /&gt;ibraries/em64t/libmkl_core.a -lguide -lpthread&lt;BR /&gt;&lt;BR /&gt;When I use them in an MacPro "old" machine (Mac OSX version 10.5.6) Processor 2x3 GHz Quad-Core Intel Xeon, Memory 16 GB, and intel fortran version 10.5.8, my program runs in a reasonable time. &lt;BR /&gt;But if I use it in our Mac Pro new generation, Mac OSX version 10.5.8, Processor 2x2.93 GHz, memory 32 GB and compiler version 11.1.058, the same programs runs ~three times slower? &lt;BR /&gt;Could you please help me to optimize the run time in the second machine?&lt;BR /&gt;Thanx!</description>
    <pubDate>Thu, 26 Nov 2009 14:46:59 GMT</pubDate>
    <dc:creator>princess_sophie</dc:creator>
    <dc:date>2009-11-26T14:46:59Z</dc:date>
    <item>
      <title>running time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741592#M1049</link>
      <description>Hello,&lt;BR /&gt;I am working on two different Mac machines and the same code takes twice or three times more to run the same code in a newer machine and newer version of the complier!!!&lt;BR /&gt;In both I use the same instructions. The complete (and very long, sorry) options are:&lt;BR /&gt;macartney:kappa debora$ ifort -Wl,-stack_size,0x10000000 -O1 -shared-intel -o nameoftheprogram.f -L/Library/Frameworks/Intel_MKL.framework/Libraries/&lt;BR /&gt;em64t-I/Library/Frameworks/Intel_MKL.framework/Headers /Library/Frameworks/Inte&lt;BR /&gt;l_MKL.framework/Libraries/em64t/libmkl_intel_lp64.a /Library/Frameworks/Intel_M&lt;BR /&gt;KL.framework/Libraries/em64t/libmkl_intel_thread.a /Library/Frameworks/Intel_MK&lt;BR /&gt;L.framework/Libraries/em64t/libmkl_core.a /Library/Frameworks/Intel_MKL.framewo&lt;BR /&gt;rk/Libraries/em64t/libmkl_intel_lp64.a /Library/Frameworks/Intel_MKL.framework/&lt;BR /&gt;Libraries/em64t/libmkl_intel_thread.a /Library/Frameworks/Intel_MKL.framework/L&lt;BR /&gt;ibraries/em64t/libmkl_core.a -lguide -lpthread&lt;BR /&gt;&lt;BR /&gt;When I use them in an MacPro "old" machine (Mac OSX version 10.5.6) Processor 2x3 GHz Quad-Core Intel Xeon, Memory 16 GB, and intel fortran version 10.5.8, my program runs in a reasonable time. &lt;BR /&gt;But if I use it in our Mac Pro new generation, Mac OSX version 10.5.8, Processor 2x2.93 GHz, memory 32 GB and compiler version 11.1.058, the same programs runs ~three times slower? &lt;BR /&gt;Could you please help me to optimize the run time in the second machine?&lt;BR /&gt;Thanx!</description>
      <pubDate>Thu, 26 Nov 2009 14:46:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741592#M1049</guid>
      <dc:creator>princess_sophie</dc:creator>
      <dc:date>2009-11-26T14:46:59Z</dc:date>
    </item>
    <item>
      <title>Re: running time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741593#M1050</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
It looks like you are comparing a unified memory dual socket machine with a recent non-uniform memory machine. The default BIOS setting on the latter usually involves alternating memory access to the local and the remote memory. In order to optimize, you select the NUMA BIOS setting and give increased attention to avoidance of false sharing and affinity issues. You may have to give attention also to "first-touch" allocation, where the first time your program initializes data arrays it is done with the same OpenMP schedule and affinity as in the bulk of the later accesses.&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Nov 2009 15:48:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741593#M1050</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-26T15:48:30Z</dc:date>
    </item>
    <item>
      <title>Re: running time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741594#M1051</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;It looks like you are comparing a unified memory dual socket machine with a recent non-uniform memory machine. The default BIOS setting on the latter usually involves alternating memory access to the local and the remote memory. In order to optimize, you select the NUMA BIOS setting and give increased attention to avoidance of false sharing and affinity issues. You may have to give attention also to "first-touch" allocation, where the first time your program initializes data arrays it is done with the same OpenMP schedule and affinity as in the bulk of the later accesses.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Dear tim18,&lt;BR /&gt;Thank you for your fast response. However, I am a new user of this machines and operating systems and actually I am not related with the computer science, so I would ask you if you could explain your answer and or give more details... please? Sorry for disturbing you, I am just trying to have the results of my code much faster.</description>
      <pubDate>Thu, 26 Nov 2009 16:24:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741594#M1051</guid>
      <dc:creator>princess_sophie</dc:creator>
      <dc:date>2009-11-26T16:24:32Z</dc:date>
    </item>
    <item>
      <title>Re: running time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741595#M1052</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Princess,&lt;BR /&gt;&lt;BR /&gt;Can you provide the processor numbers for the two machines. Also include memory information (number of sticks, speed, etc...)&lt;BR /&gt;&lt;BR /&gt;NUMA configuration considerations would not alter performance by 2x to 3x times.&lt;BR /&gt;Cache size and type, combined with your application's memory usage may exhibit this.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Nov 2009 16:34:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741595#M1052</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2009-11-26T16:34:39Z</dc:date>
    </item>
    <item>
      <title>Re: running time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741596#M1053</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/442456"&gt;princess_sophie&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Dear tim18,&lt;BR /&gt;Thank you for your fast response. However, I am a new user of this machines and operating systems and actually I am not related with the computer science, so I would ask you if you could explain your answer and or give more details... please? Sorry for disturbing you, I am just trying to have the results of my code much faster.&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
If the bulk of your time is spent in MKL, it may be sufficient to set the BIOS NUMA mode, and set the KMP_AFFINITY environment variable. As MKL doesn't normally use HyperThreads, the process may be simplified by disabling HT in the BIOS. It's made more complicated by the lack of a uniform scheme for BIOS numbering of cores and hyperthreads, so, unfortunately, it does involve investigation on your part. If you would set KMP_AFFINITY=compact,0,verbose and show us the resulting screen echo, we could tell you if that appears to be working.&lt;BR /&gt;Also, if it is primarly a concern with MKL performance, the MKL forum would be a good resource.&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Nov 2009 16:36:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741596#M1053</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-26T16:36:35Z</dc:date>
    </item>
    <item>
      <title>Re: running time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741597#M1054</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Hello Jim,&lt;BR /&gt;Thank you a lot for your help. I do not have any idea about what NUMA is, so I'd better fisrt give you some details about the machines, hopefully this information will be enough to determine the optimization of the program execution.&lt;BR /&gt;For the"faster" computer I have the following information:&lt;BR /&gt;Model Name: Mac Pro&lt;BR /&gt;Model Identifier: MacPro2,1&lt;BR /&gt;Processor Name: Quad-Core Intel Xeon&lt;BR /&gt;Processor Speed: 3 GHz&lt;BR /&gt;Number Of Processors: 2&lt;BR /&gt;Total Number Of Cores: 8&lt;BR /&gt;L2 Cache (per processor): 8 MB&lt;BR /&gt;Memory: 16 GB&lt;BR /&gt;Bus Speed: 1.33 GHz&lt;BR /&gt;Boot ROM Version: MP21.007F.B06&lt;BR /&gt;Memory:&lt;BR /&gt;Four of this &lt;BR /&gt;DIMM Riser A/DIMM 1:&lt;BR /&gt;&lt;BR /&gt;Size: 2 GB&lt;BR /&gt;Type: DDR2 FB-DIMM&lt;BR /&gt;Speed: 667 MHz&lt;BR /&gt;Status: OK&lt;BR /&gt;&lt;BR /&gt;And four of this&lt;BR /&gt;DIMM Riser B/DIMM 1:&lt;BR /&gt;&lt;BR /&gt;Size: 2 GB&lt;BR /&gt;Type: DDR2 FB-DIMM&lt;BR /&gt;Speed: 667 MHz&lt;BR /&gt;Status: OK&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Whereas for the "slow" machine I found:&lt;BR /&gt;&lt;BR /&gt;Model Name: Mac Pro&lt;BR /&gt;Model Identifier: MacPro4,1&lt;BR /&gt;Processor Name: Quad-Core Intel Xeon&lt;BR /&gt;Processor Speed: 2.93 GHz&lt;BR /&gt;Number Of Processors: 2&lt;BR /&gt;Total Number Of Cores: 8&lt;BR /&gt;L2 Cache (per core): 256 KB&lt;BR /&gt;L3 Cache (per processor): 8 MB&lt;BR /&gt;Memory: 32 GB&lt;BR /&gt;Processor Interconnect Speed: 6.4 GT/s&lt;BR /&gt;Boot ROM Version: MP41.0081.B04&lt;BR /&gt;SMC Version (system): 1.39f5&lt;BR /&gt;SMC Version (processor tray): 1.39f5&lt;BR /&gt;&lt;BR /&gt;With 8 memory slots like this:&lt;BR /&gt;Memory Slots:&lt;BR /&gt;&lt;BR /&gt;ECC: Enabled&lt;BR /&gt;&lt;BR /&gt;DIMM 1:&lt;BR /&gt;&lt;BR /&gt;Size: 4 GB&lt;BR /&gt;Type: DDR3 ECC&lt;BR /&gt;Speed: 1066 MHz&lt;BR /&gt;Status: OK&lt;BR /&gt;&lt;BR /&gt;Is this enough information de define why one is 3 times slower? and would it be possibel to help me to optimize the best compilation parameters for each computer?&lt;BR /&gt;Thank you all!&lt;BR /&gt;&lt;BR /&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/99850"&gt;jimdempseyatthecove&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Princess,&lt;BR /&gt;&lt;BR /&gt;Can you provide the processor numbers for the two machines. Also include memory information (number of sticks, speed, etc...)&lt;BR /&gt;&lt;BR /&gt;NUMA configuration considerations would not alter performance by 2x to 3x times.&lt;BR /&gt;Cache size and type, combined with your application's memory usage may exhibit this.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;</description>
      <pubDate>Tue, 01 Dec 2009 15:13:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741597#M1054</guid>
      <dc:creator>princess_sophie</dc:creator>
      <dc:date>2009-12-01T15:13:34Z</dc:date>
    </item>
    <item>
      <title>Re: running time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741598#M1055</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
you have 4 variables: 2 machines and 2 compilers. Compile the code with -i-static on each machine. Copy the executables from each machine to the other machine. Run both executables on each machine. This will determine if it is the compiler or the computer that is responsible.&lt;BR /&gt;&lt;BR /&gt;ron&lt;BR /&gt;</description>
      <pubDate>Tue, 01 Dec 2009 15:21:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/running-time/m-p/741598#M1055</guid>
      <dc:creator>Ron_Green</dc:creator>
      <dc:date>2009-12-01T15:21:02Z</dc:date>
    </item>
  </channel>
</rss>

