<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic At full optimization, ifort in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Vectorizing-3D-Arrays-Product-in-Fortran/m-p/1147459#M7792</link>
    <description>&lt;P&gt;At full optimization, ifort would perform an optimization equivalent to #2 if possible.&amp;nbsp; #3 is prone to generation of temporary arrays.&amp;nbsp; ifort optimization reports should show when either of those happen.&lt;/P&gt;</description>
    <pubDate>Thu, 26 Oct 2017 17:41:43 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2017-10-26T17:41:43Z</dc:date>
    <item>
      <title>Vectorizing 3D Arrays Product in Fortran</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Vectorizing-3D-Arrays-Product-in-Fortran/m-p/1147457#M7790</link>
      <description>&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;
	&lt;DIV&gt;&lt;SPAN style="font-size: 12.8px;"&gt;Considering the product of arrays with size &lt;/SPAN&gt;nx*ny*nz&lt;SPAN style="font-size: 12.8px;"&gt;&amp;nbsp;in Fortran. The given cases will generate the same result:&lt;/SPAN&gt;&lt;/DIV&gt;

	&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
	&lt;B&gt;#1&lt;/B&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;
	&lt;PRE class="brush:fortran;"&gt;do k=1,nz
 do j=1,ny
  do i=1,nx
   A(i,j,k)=A(i,j,k)*B(i,j,k)*0.5
  enddo
 enddo
enddo&lt;/PRE&gt;

	&lt;P&gt;&amp;nbsp;&lt;/P&gt;

	&lt;P&gt;&lt;B style="font-size: 12.8px;"&gt;#2&lt;/B&gt;&lt;/P&gt;
&lt;/DIV&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;
	&lt;DIV&gt;
		&lt;PRE class="brush:fortran;"&gt;do ijk=1,nx*ny*nz
 A(ijk,1,1)=A(ijk,1,1)*B(ijk,1,1)*0.5
enddo&lt;/PRE&gt;

		&lt;P&gt;&lt;BR /&gt;
			&lt;BR /&gt;
			&lt;B&gt;#3&amp;nbsp;&lt;/B&gt;&lt;/P&gt;

		&lt;PRE class="brush:fortran;"&gt;A(:,:,:)=A(:,:,:)*B(:,:,:)*0.5&lt;/PRE&gt;

		&lt;P&gt;&lt;B&gt;#4&amp;nbsp;&lt;/B&gt;&lt;/P&gt;

		&lt;PRE class="brush:fortran;"&gt;A=A*B*0.5&lt;/PRE&gt;

		&lt;P&gt;&lt;BR /&gt;
			&amp;nbsp;&lt;/P&gt;
	&lt;/DIV&gt;

	&lt;DIV&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Question: which case (#1, #2, #3 or #4) is the best in terms of vectorization and performance and why?&lt;/SPAN&gt;&lt;/DIV&gt;

	&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Someone recommend a literature or website where I can learn more about it?&lt;/SPAN&gt;&lt;/DIV&gt;

	&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Thank you,&lt;/SPAN&gt;&lt;/DIV&gt;

	&lt;DIV&gt;&lt;SPAN style="font-family: arial, helvetica, sans-serif;"&gt;Ricardo&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 26 Oct 2017 14:48:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Vectorizing-3D-Arrays-Product-in-Fortran/m-p/1147457#M7790</guid>
      <dc:creator>Ricardo_F_</dc:creator>
      <dc:date>2017-10-26T14:48:43Z</dc:date>
    </item>
    <item>
      <title>#4 "should" be best and</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Vectorizing-3D-Arrays-Product-in-Fortran/m-p/1147458#M7791</link>
      <description>&lt;P&gt;#4 "should" be best and equivalent to #1&lt;/P&gt;

&lt;P&gt;To improve performance you would want to align the arrays (A and B) to cache line address&lt;/P&gt;

&lt;P&gt;!DIR$ ATTRIBUTES ALIGN: 64:: A, B&lt;/P&gt;

&lt;P&gt;#2 won't work if array bounds checking is enabled. If array dimensions for aligned allocations&amp;nbsp;do not produce rows (1st index) of multiples of cache line, for the example above #1 can have the loops collapsed (see IVF index)&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Thu, 26 Oct 2017 17:10:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Vectorizing-3D-Arrays-Product-in-Fortran/m-p/1147458#M7791</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2017-10-26T17:10:00Z</dc:date>
    </item>
    <item>
      <title>At full optimization, ifort</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Vectorizing-3D-Arrays-Product-in-Fortran/m-p/1147459#M7792</link>
      <description>&lt;P&gt;At full optimization, ifort would perform an optimization equivalent to #2 if possible.&amp;nbsp; #3 is prone to generation of temporary arrays.&amp;nbsp; ifort optimization reports should show when either of those happen.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Oct 2017 17:41:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Vectorizing-3D-Arrays-Product-in-Fortran/m-p/1147459#M7792</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2017-10-26T17:41:43Z</dc:date>
    </item>
    <item>
      <title>FWIW I use a similar</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Vectorizing-3D-Arrays-Product-in-Fortran/m-p/1147460#M7793</link>
      <description>&lt;P&gt;FWIW I use a similar technique to #2. In a module I have:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; subroutine ArrayDivide6x6(a,d)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; implicit none
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real(i8), intent(inout) :: a(6*6)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real(i8), intent(in) :: d
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a = a / d
&amp;nbsp;&amp;nbsp;&amp;nbsp; end subroutine ArrayDivide6x6
...
&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;REAL&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Consolas" size="2"&gt;&lt;FONT face="Consolas" size="2"&gt;(i8),&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;DIMENSION&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Consolas" size="2"&gt;&lt;FONT face="Consolas" size="2"&gt;(6,6), &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;INTENT&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Consolas" size="2"&gt;&lt;FONT face="Consolas" size="2"&gt;(&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;OUT&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Consolas" size="2"&gt;&lt;FONT face="Consolas" size="2"&gt;) :: qq_Transpose
...
&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;call&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Consolas" size="2"&gt;&lt;FONT face="Consolas" size="2"&gt; ArrayDivide6x6(qq_Transpose, &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;&lt;FONT color="#0000ff" face="Consolas" size="2"&gt;exp&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Consolas" size="2"&gt;&lt;FONT face="Consolas" size="2"&gt;(xsi*z))

&lt;/FONT&gt;&lt;/FONT&gt;
&lt;/PRE&gt;

&lt;P&gt;You should be able to use a similar technique with your example code.&lt;BR /&gt;
	YMMV&lt;/P&gt;

&lt;P&gt;I noticed in IVF V17u1 an optimization&amp;nbsp;issue with&lt;BR /&gt;
	&amp;nbsp; qq_Transpose = qq_Transpose / exp(xsi*z)&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 27 Oct 2017 00:22:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Vectorizing-3D-Arrays-Product-in-Fortran/m-p/1147460#M7793</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2017-10-27T00:22:30Z</dc:date>
    </item>
  </channel>
</rss>

