<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic A caution with !dir$ loop in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135667#M135677</link>
    <description>&lt;P&gt;A caution with !dir$ loop count:&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;If you omit the max, min, or avg clause, it is likely to reduce performance for any count other than those specified.&amp;nbsp; So, you should use one of those clauses when the loop count varies.&amp;nbsp; For example, if the loop count varies from 1 to 99, !dir$ loop count avg=50 should work.&amp;nbsp; Without the directive, such a short loop (if vectorized) might be unrolled excessively.&lt;/P&gt;

&lt;P&gt;I'm not aware of how this works with auto-parallel, but Jim's suggestion is good.&amp;nbsp; For example, if a loop is asserted to execute at most 5 times, the auto-parallel should avoid making that the only parallel loop. If you use OpenMP, you would need to specify the outer parallel loop and collapse parameter yourself, but the loop count directive might still help.&lt;/P&gt;</description>
    <pubDate>Sun, 07 Oct 2018 07:18:29 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2018-10-07T07:18:29Z</dc:date>
    <item>
      <title>any limit to #nested loops that ifort analyses for parallelism?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135660#M135670</link>
      <description>&lt;P&gt;I have a rather long FORTRAN code where the kernel comprises over 8 levels of nested DO loops but yet I only get information in the optrpt files for innermost (about 4 levels) loops... the outer loops are not referenced at all and if I use the 'annotated' (HTML) version there are no embedded comments for the outer loops. I've tried -qopt-report5 but was joyless.&lt;/P&gt;

&lt;P&gt;The code is sensitive so cannot just share but if really needed I can try to reproduce another kernel exhibiting same lack of info from the compiler&lt;/P&gt;

&lt;P&gt;Hints welcome, M&lt;/P&gt;</description>
      <pubDate>Wed, 26 Sep 2018 14:28:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135660#M135670</guid>
      <dc:creator>high_end_c_</dc:creator>
      <dc:date>2018-09-26T14:28:33Z</dc:date>
    </item>
    <item>
      <title>hi all, somebody suggested</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135661#M135671</link>
      <description>&lt;P&gt;hi all, somebody suggested that the compiler may focus on the 'leaves' (innermost loops) than the 'root' (outermost enclosing DO loop). whilst I can see that applies for optimising variables for registers and for optimising data for vectorisation, my presumption is any OpenMP compiler would rather implement coarse grained than fine grained and thus need to consider outermost loops as candidates.&lt;/P&gt;

&lt;P&gt;i am likely barking up the wrong tree, but maybe there is a "max time" or "max amount of work" that compiler does before "giving up" - so if it starts innermost (to check vectorisation options) it may run out of steam before being possible to examine outer loops to determine if any dependencies preventing parallelism - but how would I know this? and is there a flag/s to set to tell the compiler to keep on going...&lt;/P&gt;

&lt;P&gt;cheers, michael&lt;BR /&gt;
	&lt;SPAN style="font-size: 1em;"&gt;&lt;A href="https://highendcompute.co.uk" target="_blank"&gt;https://highendcompute.co.uk&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Oct 2018 17:43:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135661#M135671</guid>
      <dc:creator>high_end_c_</dc:creator>
      <dc:date>2018-10-04T17:43:10Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt; my presumption is any</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135662#M135672</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;nbsp;&lt;EM&gt;my presumption is any OpenMP compiler would rather implement coarse grained than fine grained and thus need to consider outermost loops as candidates.&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;Are you confusing OpenMP programmer directive programming with auto-parallelism?&lt;/P&gt;

&lt;P&gt;You, as the programmer, are responsible for selecting the appropriate level at which to inject parallelism into your application, or electing not to parallelize when counterproductive. Use VTune or other profiling means to assess the practicality of parallizing as well as where to apply your directives.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 05 Oct 2018 12:45:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135662#M135672</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2018-10-05T12:45:28Z</dc:date>
    </item>
    <item>
      <title>I would not expect auto</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135663#M135673</link>
      <description>&lt;P&gt;I would not expect auto-parallel to analyze consistently more levels of loops than are required for the SPEC benchmarks.&amp;nbsp; Many of those are already set up for OpenMP with requirement that OpenMP is disabled for benchmarking, so the loop nesting is already reasonable in most cases.&lt;/P&gt;

&lt;P&gt;As Jim suggested, an application of any complexity is better handled by explicit OpenMP directives. There are directives to guide auto-parallel, but that isn't so popular where OpenMP is no more difficult.&amp;nbsp; ifort is intended to work with both auto-parallel and OpenMP in the same application.&amp;nbsp; My expectation is that an OpenMP directive over-rides auto-parallel within its scope.&amp;nbsp; Of course, separate compilation of procedures could overcome that limit.&lt;/P&gt;

&lt;P&gt;It used to be that OpenMP directives would disable multi-level loop optimizations, although the newer compilers should perform those on inner loops, leaving the outer loops under control of OpenMP.&amp;nbsp; This situation argues against specifying an excessive number of loops in a collapse clause.&amp;nbsp; In my experience, ifort is more versatile than other compilers in applying the simd clause to outer loops, which requires some degree of multi-level optimization.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Oct 2018 13:11:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135663#M135673</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2018-10-05T13:11:00Z</dc:date>
    </item>
    <item>
      <title>Quote:jimdempseyatthecove</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135664#M135674</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;gt;&amp;gt;&amp;nbsp;&lt;EM&gt;my presumption is any OpenMP compiler would rather implement coarse grained than fine grained and thus need to consider outermost loops as candidates.&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;Are you confusing OpenMP programmer directive programming with auto-parallelism?&lt;/P&gt;

&lt;P&gt;You, as the programmer, are responsible for selecting the appropriate level at which to inject parallelism into your application, or electing not to parallelize when counterproductive. Use VTune or other profiling means to assess the practicality of parallizing as well as where to apply your directives.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks Jim. I do appreciate the difference but isn't the compiler's auto-parallelisation method to think a similar way to an OpenMP programmer. And my -parallel and -opt-report one can see which loops have been considered *generally*. But my question is why I do not get anything in the optrpt when I have ~8 levels of nested loops and ONLY the inner most ones have such optrpt comments. Sorry if I was unclear in original Q. Yrs, M&lt;/P&gt;</description>
      <pubDate>Sat, 06 Oct 2018 08:30:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135664#M135674</guid>
      <dc:creator>high_end_c_</dc:creator>
      <dc:date>2018-10-06T08:30:56Z</dc:date>
    </item>
    <item>
      <title>Quote:Tim P. wrote:</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135665#M135675</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Tim P. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I would not expect auto-parallel to analyze consistently more levels of loops than are required for the SPEC benchmarks.&amp;nbsp; Many of those are already set up for OpenMP with requirement that OpenMP is disabled for benchmarking, so the loop nesting is already reasonable in most cases.&lt;/P&gt;

&lt;P&gt;As Jim suggested, an application of any complexity is better handled by explicit OpenMP directives. There are directives to guide auto-parallel, but that isn't so popular where OpenMP is no more difficult.&amp;nbsp; ifort is intended to work with both auto-parallel and OpenMP in the same application.&amp;nbsp; My expectation is that an OpenMP directive over-rides auto-parallel within its scope.&amp;nbsp; Of course, separate compilation of procedures could overcome that limit.&lt;/P&gt;

&lt;P&gt;It used to be that OpenMP directives would disable multi-level loop optimizations, although the newer compilers should perform those on inner loops, leaving the outer loops under control of OpenMP.&amp;nbsp; This situation argues against specifying an excessive number of loops in a collapse clause.&amp;nbsp; In my experience, ifort is more versatile than other compilers in applying the simd clause to outer loops, which requires some degree of multi-level optimization.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	Thanks Tim. Further to my reply also to Jim, I do prefer to do OMP myself. I was just expecting some dependency analysis for loops from the par report all the way out to the outermost loop: this I've seen and used many times in order to guide me as to why I may (or not) be able to parallelsie some loops (or what as a OMP programmer I'd have to address) and also to sometimes help me ensure I have my OpenMP data clauses correct. Hence this part of ifort has been an invaluable tool.&lt;/P&gt;

&lt;P&gt;With this current code, with a much larger number of nested levels of DO, I suddenly (it seems) have no such report back for the outer levels. It's why I am not getting this info that I ask. Are you saying that Intel just look how many levels in SPEC, work on that number of levels to ensure really good benchmark stats, and for deeper loops (more levels) works from innermost (as one expects) until hits that number and then does NOT even try to look at the remaining outer loops? That also sounds there is a parameter (eg set to #levels-in-SPEC) that one could amend - if that's possible, that's what I'd like to try&lt;/P&gt;

&lt;P&gt;For now, I'll get back to those in the code's discipline to discuss the natural parallelism they believe inherent in their problem to attack this problem from a higher angle in order to expose parallelism (as well as a long print-out to manually determine data dependencies)&lt;/P&gt;

&lt;P&gt;Best wishes, Michael&lt;/P&gt;</description>
      <pubDate>Sat, 06 Oct 2018 08:39:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135665#M135675</guid>
      <dc:creator>high_end_c_</dc:creator>
      <dc:date>2018-10-06T08:39:06Z</dc:date>
    </item>
    <item>
      <title>A common problem for the</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135666#M135676</link>
      <description>&lt;P&gt;A common problem for the compiler to optimally place auto parallelization is when it cannot determine loop iteration counts as well as&amp;nbsp;size of thread pool at compile time. You do have available to you:&lt;/P&gt;

&lt;H1 class="topictitle1"&gt;LOOP COUNT&lt;/H1&gt;

&lt;P&gt;&lt;!--General Compiler Directive:  Specifies the iterations (typical trip count) for a DO loop.--&gt;&lt;/P&gt;

&lt;DIV&gt;
	&lt;P class="shortdesc"&gt;&lt;STRONG&gt;General Compiler Directive:&lt;/STRONG&gt; Specifies the iterations (typical trip count) for a DO loop.&lt;/P&gt;

	&lt;DIV class="section" id="GUID-49FE20D4-4119-4586-A6E6-E50AF9DB2D34"&gt;
		&lt;P class="dlsyntaxpara"&gt;&lt;SPAN class="kwd"&gt;!DIR$ LOOP COUNT&lt;/SPAN&gt;&lt;SPAN class="delim"&gt; (&lt;/SPAN&gt;&lt;SPAN class="var"&gt;n1&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;[&lt;/SPAN&gt;&lt;SPAN class="sep"&gt;,&lt;/SPAN&gt;&lt;SPAN class="var"&gt;n2&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;]&lt;/SPAN&gt;&lt;SPAN class="sep"&gt;...&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;)&lt;/SPAN&gt;&lt;/P&gt;

		&lt;P class="dlsyntaxpara"&gt;&lt;SPAN class="kwd"&gt;!DIR$ LOOP COUNT&lt;/SPAN&gt;&lt;SPAN class="oper"&gt;= &lt;/SPAN&gt;&lt;SPAN class="var"&gt;n1&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;[&lt;/SPAN&gt;&lt;SPAN class="sep"&gt;,&lt;/SPAN&gt;&lt;SPAN class="var"&gt;n2&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;]&lt;/SPAN&gt;&lt;SPAN class="sep"&gt;...&lt;/SPAN&gt;&lt;/P&gt;

		&lt;P class="dlsyntaxpara"&gt;&lt;SPAN class="kwd"&gt;!DIR$ LOOP COUNT MAX&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;(&lt;/SPAN&gt;&lt;SPAN class="var"&gt;n1&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;)&lt;/SPAN&gt;&lt;SPAN class="sep"&gt;, &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;MIN&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;(&lt;/SPAN&gt;&lt;SPAN class="var"&gt;n1&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;)&lt;/SPAN&gt;&lt;SPAN class="sep"&gt;, &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;AVG&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;(&lt;/SPAN&gt;&lt;SPAN class="var"&gt;n1&lt;/SPAN&gt;&lt;SPAN class="delim"&gt;)&lt;/SPAN&gt;&lt;/P&gt;

		&lt;P class="dlsyntaxpara"&gt;&lt;SPAN class="kwd"&gt;!DIR$ LOOP COUNT MAX&lt;/SPAN&gt;&lt;SPAN class="oper"&gt;=&lt;/SPAN&gt;&lt;SPAN class="var"&gt;n1&lt;/SPAN&gt;&lt;SPAN class="sep"&gt;, &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;MIN&lt;/SPAN&gt;&lt;SPAN class="oper"&gt;=&lt;/SPAN&gt;&lt;SPAN class="var"&gt;n1&lt;/SPAN&gt;&lt;SPAN class="sep"&gt;, &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;AVG&lt;/SPAN&gt;&lt;SPAN class="oper"&gt;=&lt;/SPAN&gt;&lt;SPAN class="var"&gt;n1&lt;/SPAN&gt;&lt;/P&gt;

		&lt;DL class="dlsyntax"&gt;&lt;/DL&gt;

		&lt;TABLE width="90%" style="border-collapse: collapse; border-spacing: 0;" border="0" cellspacing="0" cellpadding="4"&gt;
			&lt;TBODY&gt;
				&lt;TR&gt;
					&lt;TD width="30%" class="noborder" valign="top"&gt;
						&lt;P&gt;&lt;SPAN class="parmname"&gt;n1&lt;/SPAN&gt;, &lt;SPAN class="parmname"&gt;n2&lt;/SPAN&gt;&lt;/P&gt;
					&lt;/TD&gt;
					&lt;TD class="noborder" valign="top"&gt;
						&lt;P class="syntaxnote"&gt;Is a non-negative integer constant.&lt;/P&gt;
					&lt;/TD&gt;
				&lt;/TR&gt;
			&lt;/TBODY&gt;
		&lt;/TABLE&gt;
	&lt;/DIV&gt;

	&lt;DIV class="section" id="GUID-6EA6B6FB-FCB7-4A71-B36D-B72C848DDEBA"&gt;
		&lt;P&gt;The value of the loop count affects heuristics used in software pipelining, vectorization, and loop-transformations.&lt;/P&gt;

		&lt;DIV class="tablenoborder"&gt;
			&lt;TABLE width="100%" id="GUID-0227333B-A9C7-4618-89D2-55967EA951D5" border="1" rules="all" frame="hsides" cellpadding="4" summary=""&gt;
				&lt;THEAD align="left"&gt;
					&lt;TR&gt;
						&lt;TH width="50%" class="cellrowborder" id="d222482e172" valign="top"&gt;
							&lt;P&gt;Argument Form&lt;/P&gt;
						&lt;/TH&gt;
						&lt;TH width="50%" class="row-nocellborder" id="d222482e175" valign="top"&gt;
							&lt;P&gt;Description&lt;/P&gt;
						&lt;/TH&gt;
					&lt;/TR&gt;
				&lt;/THEAD&gt;
				&lt;TBODY&gt;
					&lt;TR&gt;
						&lt;TD width="50%" class="cellrowborder" valign="top" headers="d222482e172 "&gt;
							&lt;P&gt;&lt;VAR&gt;n1&lt;/VAR&gt; [, &lt;VAR&gt;n2&lt;/VAR&gt;]&lt;/P&gt;
						&lt;/TD&gt;
						&lt;TD width="50%" class="row-nocellborder" valign="top" headers="d222482e175 "&gt;
							&lt;P&gt;Indicates that the next DO loop will iterate &lt;VAR&gt;n1&lt;/VAR&gt;, &lt;VAR&gt;n2&lt;/VAR&gt;, or some other number of times.&lt;/P&gt;
						&lt;/TD&gt;
					&lt;/TR&gt;
					&lt;TR&gt;
						&lt;TD width="50%" class="cellrowborder" valign="top" headers="d222482e172 "&gt;
							&lt;P&gt;MAX, MIN, and AVG&lt;/P&gt;
						&lt;/TD&gt;
						&lt;TD width="50%" class="row-nocellborder" valign="top" headers="d222482e175 "&gt;
							&lt;P&gt;Indicates that the next DO loop has the specified maximum, minimum, and average number (&lt;VAR&gt;n1&lt;/VAR&gt;) of iterations.&lt;/P&gt;
						&lt;/TD&gt;
					&lt;/TR&gt;
				&lt;/TBODY&gt;
			&lt;/TABLE&gt;
		&lt;/DIV&gt;
	&lt;/DIV&gt;

	&lt;DIV class="section" id="GUID-8094ECF6-AF86-4002-B3A2-3A562A7A8DE7"&gt;
		&lt;H2 class="sectiontitle"&gt;Example&lt;/H2&gt;

		&lt;P&gt;Consider the following:&lt;/P&gt;

		&lt;PRE class="language-fortran"&gt;&lt;CODE class="language-fortran"&gt;&lt;SPAN class="token omp"&gt;!DIR$&lt;/SPAN&gt; LOOP COUNT &lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;10000&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;
&lt;SPAN class="token keyword"&gt;do&lt;/SPAN&gt; i &lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;1&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;,&lt;/SPAN&gt;m 
b&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;i&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt; a&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;i&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token number"&gt;+1&lt;/SPAN&gt; &lt;SPAN class="token comment" spellcheck="true"&gt;! This is likely to enable the loop to get software-pipelined &lt;/SPAN&gt;
&lt;SPAN class="token keyword"&gt;enddo&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;

		&lt;P&gt;Note that you can specify more than one LOOP COUNT directive for a DO loop. For example, the following directives are valid:&lt;/P&gt;

		&lt;PRE class="language-fortran"&gt;&lt;CODE class="language-fortran"&gt;&lt;SPAN class="token omp"&gt;!DIR$&lt;/SPAN&gt; LOOP COUNT &lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;10&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token number"&gt;20&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token number"&gt;30&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt; 
&lt;SPAN class="token omp"&gt;!DIR$&lt;/SPAN&gt; LOOP COUNT MAX&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;100&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;,&lt;/SPAN&gt; MIN&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;3&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;,&lt;/SPAN&gt; AVG&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;17&lt;/SPAN&gt; 
&lt;SPAN class="token keyword"&gt;DO&lt;/SPAN&gt; 
...&lt;/CODE&gt;&lt;/PRE&gt;
	&lt;/DIV&gt;
&lt;/DIV&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Sat, 06 Oct 2018 15:30:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135666#M135676</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2018-10-06T15:30:37Z</dc:date>
    </item>
    <item>
      <title>A caution with !dir$ loop</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135667#M135677</link>
      <description>&lt;P&gt;A caution with !dir$ loop count:&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;If you omit the max, min, or avg clause, it is likely to reduce performance for any count other than those specified.&amp;nbsp; So, you should use one of those clauses when the loop count varies.&amp;nbsp; For example, if the loop count varies from 1 to 99, !dir$ loop count avg=50 should work.&amp;nbsp; Without the directive, such a short loop (if vectorized) might be unrolled excessively.&lt;/P&gt;

&lt;P&gt;I'm not aware of how this works with auto-parallel, but Jim's suggestion is good.&amp;nbsp; For example, if a loop is asserted to execute at most 5 times, the auto-parallel should avoid making that the only parallel loop. If you use OpenMP, you would need to specify the outer parallel loop and collapse parameter yourself, but the loop count directive might still help.&lt;/P&gt;</description>
      <pubDate>Sun, 07 Oct 2018 07:18:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135667#M135677</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2018-10-07T07:18:29Z</dc:date>
    </item>
    <item>
      <title>Useful reminder of "!dir$</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135668#M135678</link>
      <description>&lt;P&gt;Useful reminder of "!dir$ loop count", many thanks.&lt;/P&gt;

&lt;P&gt;But my main question is why I am getting nothing back in the optrpt for these outer loops, whereas I do for inner loops. If the compiler would tell me there's dependencies or too little work for said loops (like it does for inner ones) then I'd know it was a least checking, but it seems strange there is nothing in optrpt relating to the line numbers of the outer few loops...&lt;/P&gt;

&lt;P&gt;hope that's a useful new angle? m&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Oct 2018 19:00:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/any-limit-to-nested-loops-that-ifort-analyses-for-parallelism/m-p/1135668#M135678</guid>
      <dc:creator>high_end_c_</dc:creator>
      <dc:date>2018-10-09T19:00:15Z</dc:date>
    </item>
  </channel>
</rss>

