<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic JB D in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946607#M91307</link>
    <description>&lt;P&gt;JB D&lt;/P&gt;
&lt;P&gt;In looking at your stream(f) function it essentially rotates sections of an array. This is memory bandwidth heavy. I cannot see the outer levels of your program, so I will throw something out for you to consider.&lt;/P&gt;
&lt;P&gt;Rotation can be accomplished by using modulus arithmatic on the indicies.&lt;/P&gt;
&lt;P&gt;[fortran]&lt;BR /&gt;xBase = xBase + 1 ! rotate in +x&lt;BR /&gt;yBase = yBase + 1 ! rotate in +y&lt;BR /&gt;do yRing = 1, yDim&lt;BR /&gt;&amp;nbsp; do xRing = 1, xDim&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x = MOD(xBase + xRing - 1, xDim) + 1&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; y = MOD(yBase + yRing - 1, yDim) + 1&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ! use x and y as indicies as before&lt;BR /&gt;[/fortran]&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
    <pubDate>Sat, 04 May 2013 12:29:10 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2013-05-04T12:29:10Z</dc:date>
    <item>
      <title>Use of only 25% of CPU with Auto-Parallelization</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946588#M91288</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I'm using Intel Visual Fortran Compiler Pro 11.1 to compile my code on an Intel core i5 architecture.&lt;/P&gt;
&lt;P&gt;Because I would like to parallelize the exectution of the programm i use the "-c /Qparallel" option at the compilation step, and the "/Qpar-report" option outputs that almost all the loops have been parrallelized.&lt;/P&gt;
&lt;P&gt;But when i execute my programm, only 25% of the total CPU ressource is allocated to the reffering process, enven if all the proccessors seem to work simultaneously. I've tried to&amp;nbsp;set the priority of the process at "/high" when i execute the programm, with no effects, and the affinity is set by default on all the 4 processors.&lt;/P&gt;
&lt;P&gt;I don't know what is going wrong, thanks in advance for any help.&lt;/P&gt;
&lt;P&gt;JB&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 09:12:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946588#M91288</guid>
      <dc:creator>JB_D_</dc:creator>
      <dc:date>2013-04-30T09:12:20Z</dc:date>
    </item>
    <item>
      <title>DId you examine with /Qpar</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946589#M91289</link>
      <description>&lt;P&gt;DId you examine with /Qpar-report to see whether the important parts of your program are parallelized, or get diagnostics on why not?&lt;/P&gt;
&lt;P&gt;If your objective is simply to max out your multiple thread meter, you might add /Qpar-threshold0&amp;nbsp; This asserts you want to maximize parallelism at the expense of performance.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 11:47:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946589#M91289</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-04-30T11:47:46Z</dc:date>
    </item>
    <item>
      <title>Thank you to answer,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946590#M91290</link>
      <description>&lt;P&gt;Thank you to answer,&lt;/P&gt;
&lt;P&gt;I actually tried to use a treshold0 option to ensure that all the loops are parallelized, but it doesn't change the CPU usage, enven if all the loops are parallelized according to the /Qpar-report.&lt;/P&gt;
&lt;P&gt;It is like every thing was calculated on a single core, inspite of no processor is fully used, the calculus seems spread out over the 4 processors, but with a maxi use of 25% of the total CPU capability...&lt;/P&gt;
&lt;P&gt;Many thanks for your help !&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 13:51:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946590#M91290</guid>
      <dc:creator>JB_D_</dc:creator>
      <dc:date>2013-04-30T13:51:00Z</dc:date>
    </item>
    <item>
      <title>What percentage of your</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946591#M91291</link>
      <description>&lt;P&gt;What percentage of your program is spent in the loops? There could be memory bottle necks or other issues preventing your program from fully utilizing each core.&lt;/P&gt;
&lt;P&gt;Annalee&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 14:25:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946591#M91291</guid>
      <dc:creator>Anonymous66</dc:creator>
      <dc:date>2013-04-30T14:25:24Z</dc:date>
    </item>
    <item>
      <title>The program is a sequence of</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946592#M91292</link>
      <description>&lt;P&gt;The program is a sequence of imbricated loops (5 steps of 2-level loops at least). I guess this schem fit well for auto-parallelism isn't it?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Do you think that using Open MP may deeply increase the efficiency of the parallelization? What is weird is that the CPU allocation of my process is always staked at 25% precisely!&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 14:52:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946592#M91292</guid>
      <dc:creator>JB_D_</dc:creator>
      <dc:date>2013-04-30T14:52:00Z</dc:date>
    </item>
    <item>
      <title>Identify a process intensive</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946593#M91293</link>
      <description>&lt;P&gt;Identify a process intensive loop that has been reported as being parallelized. Run in Debug mode, place break in loop, run to break point. Open the Debug Window for Threads, how many threads are listed?&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 15:33:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946593#M91293</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2013-04-30T15:33:06Z</dc:date>
    </item>
    <item>
      <title>Applying OpenMP may give you</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946594#M91294</link>
      <description>&lt;P&gt;Applying OpenMP may give you more insight; among other things you can check the number of threads assigned within a parallel region, and see whether your loops can be successfully parallelized without hidden transformations used by -Qparallel.&lt;/P&gt;
&lt;P&gt;I suspect you must set /O explicitly along with /Qparallel for it to operate in debug build.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 16:09:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946594#M91294</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-04-30T16:09:11Z</dc:date>
    </item>
    <item>
      <title>Thank you for your answer, I</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946595#M91295</link>
      <description>&lt;P&gt;Thank you for your answer, I'm going to check that.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 16:35:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946595#M91295</guid>
      <dc:creator>JB_D_</dc:creator>
      <dc:date>2013-04-30T16:35:00Z</dc:date>
    </item>
    <item>
      <title>Thank you for your answer, I</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946596#M91296</link>
      <description>&lt;P&gt;Thank you for your answer, I'm going to check that.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 16:35:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946596#M91296</guid>
      <dc:creator>JB_D_</dc:creator>
      <dc:date>2013-04-30T16:35:34Z</dc:date>
    </item>
    <item>
      <title>Hi all,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946597#M91297</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;First I would like to thank a lot jim and iliyapolak, the debugger and xperf helped me to find that there was no parallelization in my code. I found in this forum that I had to check data dependency in my loops before using /Qparallel savagely :), and I realized that there's no magic tool for parallelization.&lt;/P&gt;
&lt;P&gt;Because my code is pretty much light, I tried to use OpenMP directives in my code, mostly to parallelize independent implicit loops in a subroutine.&amp;nbsp; The parallelization works fine, but my program is slower than before. Here is the code of this routine:&lt;/P&gt;
&lt;P&gt;[fortran]&lt;/P&gt;
&lt;P&gt;!&amp;nbsp;&amp;nbsp;&amp;nbsp; ========================================================&lt;BR /&gt;!&amp;nbsp;&amp;nbsp;&amp;nbsp; Streaming step: the population functions are shifted&lt;BR /&gt;!&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; one site along their corresponding lattice direction&lt;BR /&gt;!&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (no temporary memory is needed)&lt;BR /&gt;!&amp;nbsp;&amp;nbsp;&amp;nbsp; ========================================================&lt;BR /&gt;SUBROUTINE stream(f)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; USE simParam&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; implicit none&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; double precision, INTENT(INOUT):: f(yDim,xDim,0:8)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; double precision:: periodicHor(yDim), periodicVert(xDim)&lt;/P&gt;
&lt;P&gt;!$OMP PARALLEL SHARED(f,xDim,yDim) PRIVATE(periodicHor,periodicVert)&lt;BR /&gt;&amp;nbsp;!$OMP SECTIONS&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !$OMP SECTION&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; -------------------------------------&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; right direction&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicHor&amp;nbsp;&amp;nbsp; = f(:,xDim,1)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(:,2:xDim,1) = f(:,1:xDim-1,1)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(:,1,1)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicHor&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !$OMP SECTION&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; -------------------------------------&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; up direction&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicVert&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(1,:,2)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1:yDim-1,:,2) = f(2:yDim,:,2)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(yDim,:,2)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !$OMP SECTION&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; -------------------------------------&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; left direction&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicHor&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(:,1,3)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(:,1:xDim-1,3) = f(:,2:xDim,3)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(:,xDim,3)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicHor&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !$OMP SECTION&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; -------------------------------------&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; down direction&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicVert&amp;nbsp; = f(yDim,:,4)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(2:yDim,:,4) = f(1:yDim-1,:,4)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1,:,4)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !$OMP SECTION&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; -------------------------------------&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; up-right direction&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicVert&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(1,:,5)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicHor&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(:,xDim,5)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1:yDim-1,2:xDim,5) = f(2:yDim,1:xDim-1,5)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(yDim,2:xDim,5)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert(1:xDim-1)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(yDim,1,5)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert(xDim)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1:yDim-1,1,5)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicHor(2:yDim)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !$OMP SECTION&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; -------------------------------------&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; up-left direction&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicVert&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(1,:,6)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicHor&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(:,1,6)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1:yDim-1,1:xDim-1,6) = f(2:yDim,2:xDim,6)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(yDim,1:xDim-1,6)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert(2:xDim)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(yDim,xDim,6)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert(1)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1:yDim-1,xDim,6)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicHor(2:yDim)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !$OMP SECTION&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; -------------------------------------&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; down-left direction&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicVert&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(yDim,:,7)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicHor&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(:,1,7)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(2:yDim,1:xDim-1,7) = f(1:yDim-1,2:xDim,7)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1,1:xDim-1,7)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert(2:xDim)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1,xDim,7)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert(1)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(2:yDim,xDim,7)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicHor(1:yDim-1)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !$OMP SECTION&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; -------------------------------------&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; !&amp;nbsp;&amp;nbsp;&amp;nbsp; down-right direction&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicVert&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(yDim,:,8)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; periodicHor&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = f(:,xDim,8)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(2:yDim,2:xDim,8) = f(1:yDim-1,1:xDim-1,8)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1,2:xDim,8)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert(1:xDim-1)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(1,1,8)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicVert(xDim)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; f(2:yDim,1,8)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = periodicHor(1:yDim-1)&lt;/P&gt;
&lt;P&gt;&amp;nbsp; !$OMP END SECTIONS NOWAIT&lt;BR /&gt;!$OMP END PARALLEL&lt;/P&gt;
&lt;P&gt;END SUBROUTINE stream&lt;BR /&gt;[/fortran]&lt;/P&gt;
&lt;P&gt;I think this must be caused by a scheduling issue but I don't know what kind of directive is realy efficient in that case. Thank you so much for your help !&lt;/P&gt;
&lt;P&gt;JB&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2013 12:33:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946597#M91297</guid>
      <dc:creator>JB_D_</dc:creator>
      <dc:date>2013-05-02T12:33:00Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;It is like every thing was</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946598#M91298</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;It is like every thing was calculated on a single core, inspite of no processor is fully used, the calculus seems spread out over the 4 processors, but with a maxi use of 25% of the total CPU capability...&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;What load was reported by Xperf.Was Idle thread consuming remaining 75% of cpu time?&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2013 19:32:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946598#M91298</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-02T19:32:53Z</dc:date>
    </item>
    <item>
      <title>Hello everybody,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946599#M91299</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello everybody,&lt;/P&gt;
&lt;P&gt;Sorry I guess I messed up by mistaking the fact that my first post wasn't immediately released and thus posting a new one. That's why there are two conversations on this topic.&lt;/P&gt;
&lt;P&gt;@Annalee:&lt;BR /&gt;&amp;gt;&amp;gt;&amp;gt;If your code sections are small, the overhead&amp;nbsp; involved in running in parallel may be higher than the performance gains&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;I think you must be right, this routine is the just one of the 8 steps within a main loop. But I assumed that this step was the heaviest because there are nested implicit loops and xDim and yDim are almost equal to 1000. By the way is there a specific directive for this kind of array operations? Does the OMP_NESTED=.TRUE. will improve this kind of loop?&lt;/P&gt;
&lt;P&gt;@TimP:&lt;BR /&gt;I think the tasks are quite well balanced because there is only 1 heavy operation in each section, fore instance: f(2:ny,2:nx,8) = f(1:ny-1,1:nx-1,8). So according to you KMP_AFFINITY may help, but I think I should know better my processor architecture to use this parameter efficiently, isn'it? I tried OMP_SCHEDULE wihtout any impovement.&lt;/P&gt;
&lt;P&gt;@iliyapolak:&lt;BR /&gt;I'm at work at the moment and I still don't have acces to XPerf depspite I asked for my IT to install it. I tried on my PC and noticed that, as you said all the remain usage of the CPU (75%) is taken by the Idle process, so that my process isn't constraint by any other process.&lt;/P&gt;
&lt;P&gt;To better see how parallelization slow my execution, I tried to set OMP_THREAD_LIMIT from 4 to 1 and i noticed that speed decreases linearily while the number of thread increases.&lt;/P&gt;
&lt;P&gt;Many thanks, I ask more and more questions not really related to the first topic, may I beging a new conversation?&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2013 10:16:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946599#M91299</guid>
      <dc:creator>JB_D_</dc:creator>
      <dc:date>2013-05-03T10:16:34Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...But when i execute my</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946600#M91300</link>
      <description>&amp;gt;&amp;gt;...But when i execute my programm, only 25% of the total CPU ressource is allocated to the reffering process, enven if all the
&amp;gt;&amp;gt;proccessors seem to work simultaneously...
&amp;gt;&amp;gt;
&amp;gt;&amp;gt;... there was no parallelization in my code...

Did you check with Task Manager ( I assume you use Windows ) how many threads are used? Another question is: Are there any I/O operations with the file system during processing?</description>
      <pubDate>Fri, 03 May 2013 13:33:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946600#M91300</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-03T13:33:07Z</dc:date>
    </item>
    <item>
      <title>Hi Sergey,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946601#M91301</link>
      <description>&lt;P&gt;Hi Sergey,&lt;/P&gt;
&lt;P&gt;I managed to see that there was only one thread running thanks to the debuger, I don't know how to check it with the task mananger? Anyway, I'm working on OpenMP directives, and the task manager clearly shows me that the 4cores are running.&lt;/P&gt;
&lt;P&gt;Second, your question about I/O is interesting. I actually write data on a file each golbal&amp;nbsp;iteration (my code is a main loop including 8steps at the heart of which there are nested loops). Does it influence parallelization? The step in wich my program write data into a file is not included between parallelization directive.&lt;/P&gt;
&lt;P&gt;Thank you so much for your help!&lt;/P&gt;
&lt;P&gt;JB&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2013 14:20:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946601#M91301</guid>
      <dc:creator>JB_D_</dc:creator>
      <dc:date>2013-05-03T14:20:26Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;... I don't know how to</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946602#M91302</link>
      <description>&amp;gt;&amp;gt;... I don't know how to check it with the task mananger?..

- Start &lt;STRONG&gt;Task Manager&lt;/STRONG&gt;
- Select &lt;STRONG&gt;Processes&lt;/STRONG&gt; property page
- Select &lt;STRONG&gt;View&lt;/STRONG&gt; in main menu
- Select &lt;STRONG&gt;Select Columns...&lt;/STRONG&gt; and check on &lt;STRONG&gt;Thread Count&lt;/STRONG&gt;

&amp;gt;&amp;gt;...I actually write data on a file each golbal iteration (my code is a main loop including 8steps at the heart of which there are
&amp;gt;&amp;gt;nested loops). Does it influence parallelization?

In that case I would simply comment that part in codes, build sources and repeat all tests / verifications.</description>
      <pubDate>Fri, 03 May 2013 14:31:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946602#M91302</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-03T14:31:59Z</dc:date>
    </item>
    <item>
      <title>Bravo ! Auto-Parallelisation</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946603#M91303</link>
      <description>&lt;P&gt;Bravo ! Auto-Parallelisation works fine when I comment the output step!!&lt;/P&gt;
&lt;P&gt;So how can I keep this and get auto-parallel working fine too?&lt;/P&gt;
&lt;P&gt;Another question, why execution is not faster (and even a little bit slower than mono-processing)?&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2013 14:58:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946603#M91303</guid>
      <dc:creator>JB_D_</dc:creator>
      <dc:date>2013-05-03T14:58:26Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;I'm at work at the moment</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946604#M91304</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;I'm at work at the moment and I still don't have acces to XPerf depspite I asked for my IT to install it. I tried on my PC and noticed that, as you said all the remain usage of the CPU (75%) is taken by the Idle process, so that my process isn't constraint by any other process.&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;Can you post the screenshot from your pc(when you executed Xperf)?&lt;/P&gt;
&lt;P&gt;I would not recommend to look at percentage description of cpu load.Xperf and process explorer provide better and more clearer information about the load of cpu by your thread(s).This is done by counting cpu cycles instead of measuring timer interval(~10ms).&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2013 16:29:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946604#M91304</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-03T16:29:10Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;don't know how to check it</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946605#M91305</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;don't know how to check it with the task mananger? Anyway, I'm working on OpenMP directives, and the task manager clearly shows me that the 4cores are running.&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;If you want to ensure that running threads belong to your application you can also use process explorer with its detailed view(including per thread callstack) more advanced information can be obtained with the debugger.&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2013 16:36:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946605#M91305</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-03T16:36:53Z</dc:date>
    </item>
    <item>
      <title>Hi JB,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946606#M91306</link>
      <description>Hi JB,

&amp;gt;&amp;gt;Bravo ! Auto-Parallelisation works fine when I comment the output step!!
&amp;gt;&amp;gt;
&amp;gt;&amp;gt;So how can I keep this and get auto-parallel working fine too?
&amp;gt;&amp;gt;
&amp;gt;&amp;gt;Another question, why execution is not faster (and even a little bit slower than mono-processing)?

Thanks for the update and it looks like a light at the end of a tunnel.

Regarding performance problems I wouldn't make any comments because there are &lt;STRONG&gt;too&lt;/STRONG&gt; many unknowns for me and a verification with some performance utilities, like Intel &lt;STRONG&gt;VTune&lt;/STRONG&gt; or &lt;STRONG&gt;Inspector&lt;/STRONG&gt;, could show you why it happens.

&lt;STRONG&gt;Note:&lt;/STRONG&gt; Is it possible to do a couple of tests with smaller data sets?</description>
      <pubDate>Sat, 04 May 2013 01:32:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946606#M91306</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-04T01:32:00Z</dc:date>
    </item>
    <item>
      <title>JB D</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946607#M91307</link>
      <description>&lt;P&gt;JB D&lt;/P&gt;
&lt;P&gt;In looking at your stream(f) function it essentially rotates sections of an array. This is memory bandwidth heavy. I cannot see the outer levels of your program, so I will throw something out for you to consider.&lt;/P&gt;
&lt;P&gt;Rotation can be accomplished by using modulus arithmatic on the indicies.&lt;/P&gt;
&lt;P&gt;[fortran]&lt;BR /&gt;xBase = xBase + 1 ! rotate in +x&lt;BR /&gt;yBase = yBase + 1 ! rotate in +y&lt;BR /&gt;do yRing = 1, yDim&lt;BR /&gt;&amp;nbsp; do xRing = 1, xDim&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x = MOD(xBase + xRing - 1, xDim) + 1&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; y = MOD(yBase + yRing - 1, yDim) + 1&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ! use x and y as indicies as before&lt;BR /&gt;[/fortran]&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Sat, 04 May 2013 12:29:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Use-of-only-25-of-CPU-with-Auto-Parallelization/m-p/946607#M91307</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2013-05-04T12:29:10Z</dc:date>
    </item>
  </channel>
</rss>

