<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Although the compiler has the in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-attribution-with-all-elements/m-p/1102186#M127017</link>
    <description>&lt;P&gt;Although the compiler has the ability for auto-parallelism (when enabled), it is often much better to use explicit parallelization via OpenMP (need to enable).&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;!$OMP PARALLEL DO
Do i=1, N

&amp;nbsp;&amp;nbsp;&amp;nbsp; A(i)=2*B(i)
&amp;nbsp;&amp;nbsp;&amp;nbsp; C(1,i)=2*B(i)

End do
&lt;/PRE&gt;

&lt;P&gt;Note, C(1,i)= above has a stride of N. This is inefficient when uses such as above. It would be more efficient to swap the indices (allocations and use elsewhere) such that the innter most (or only) loop access is the left most index (this is reverse from C/C++).&lt;/P&gt;

&lt;P&gt;Note 2, if you change the loop indexing, then the C(i,1)=... can be vectorized (without scatter if your CPU supports scatter).&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;!$OMP PARALLEL DO SIMD
Do i=1, N

&amp;nbsp;&amp;nbsp;&amp;nbsp; A(i)=2*B(i)
&amp;nbsp;&amp;nbsp;&amp;nbsp; C(i,1)=2*B(i) ! requires change in index order

End do
&lt;/PRE&gt;

&lt;P&gt;Additional note on the above: The computation in the above loop is relatively small with regards to the memory fetch and store. For loops like this, you may find it more efficient to restrict the number of threads to a small-ish number. The&amp;nbsp;most effective&amp;nbsp;number will vary from system to system. The number for the above loop would likely depend on the number of memory channels available on the system.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
    <pubDate>Thu, 01 Dec 2016 20:23:31 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2016-12-01T20:23:31Z</dc:date>
    <item>
      <title>Array attribution with all elements</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-attribution-with-all-elements/m-p/1102185#M127016</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;one way to set values to all elements of an array is:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;Do i=1, N

    A(i)=2*B(i)
    C(1,i)=2*B(i)

End do&lt;/PRE&gt;

&lt;P&gt;Another possibility is to use this:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;A(:)=2*B(:)

C(1,:)=2*B(:)&lt;/PRE&gt;

&lt;P&gt;Which is the optimum way to set values of an entire array? And with parallel computing?&lt;/P&gt;

&lt;P&gt;Thank you.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2016 15:12:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-attribution-with-all-elements/m-p/1102185#M127016</guid>
      <dc:creator>Nicholas_S_</dc:creator>
      <dc:date>2016-12-01T15:12:31Z</dc:date>
    </item>
    <item>
      <title>Although the compiler has the</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-attribution-with-all-elements/m-p/1102186#M127017</link>
      <description>&lt;P&gt;Although the compiler has the ability for auto-parallelism (when enabled), it is often much better to use explicit parallelization via OpenMP (need to enable).&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;!$OMP PARALLEL DO
Do i=1, N

&amp;nbsp;&amp;nbsp;&amp;nbsp; A(i)=2*B(i)
&amp;nbsp;&amp;nbsp;&amp;nbsp; C(1,i)=2*B(i)

End do
&lt;/PRE&gt;

&lt;P&gt;Note, C(1,i)= above has a stride of N. This is inefficient when uses such as above. It would be more efficient to swap the indices (allocations and use elsewhere) such that the innter most (or only) loop access is the left most index (this is reverse from C/C++).&lt;/P&gt;

&lt;P&gt;Note 2, if you change the loop indexing, then the C(i,1)=... can be vectorized (without scatter if your CPU supports scatter).&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;!$OMP PARALLEL DO SIMD
Do i=1, N

&amp;nbsp;&amp;nbsp;&amp;nbsp; A(i)=2*B(i)
&amp;nbsp;&amp;nbsp;&amp;nbsp; C(i,1)=2*B(i) ! requires change in index order

End do
&lt;/PRE&gt;

&lt;P&gt;Additional note on the above: The computation in the above loop is relatively small with regards to the memory fetch and store. For loops like this, you may find it more efficient to restrict the number of threads to a small-ish number. The&amp;nbsp;most effective&amp;nbsp;number will vary from system to system. The number for the above loop would likely depend on the number of memory channels available on the system.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2016 20:23:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-attribution-with-all-elements/m-p/1102186#M127017</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2016-12-01T20:23:31Z</dc:date>
    </item>
  </channel>
</rss>

