<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Inline array syntax speedup or slowdown? in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752381#M8415</link>
    <description>First, you should get the vec-report before modification.&lt;BR /&gt;Comparing after modification, you would see whether the directives accomplished anything.&lt;BR /&gt;&lt;BR /&gt;Then, for each array assignment where you know all the arrays are 16-byte aligned, you could add a preceding&lt;BR /&gt;!dir$ vector aligned&lt;BR /&gt;It looks like this could work only if ncolx, ncoly, ncolz are all multiples of 2.&lt;BR /&gt;&lt;BR /&gt;If you would replace /2.0_wp by *0.5_wp, you would avoid a dependence on the option -no-prec-div (which apparently you are using, by default). I don't know why this transformation is forbidden by the preferable option -prec-div.</description>
    <pubDate>Tue, 18 Jan 2011 16:26:58 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2011-01-18T16:26:58Z</dc:date>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752373#M8407</link>
      <description>I have eliminated all the major loops in my program in favor of the Fortran 90&lt;BR /&gt;inline array syntax, e.g.&lt;BR /&gt;&lt;BR /&gt;do is = 1, 2&lt;BR /&gt; pout(:,:,:,is) = pout(:,:,:,is) - dbmasq(:,:,:,2)*pswk(:,:,:,is)&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;where pout and pswk are complex. Previously, I had loops over ix,iy,iz (there are quite a&lt;BR /&gt;few loops, like hundred). After doing this I actually see a slowdown. The test run I was&lt;BR /&gt;using went from 23 minutes to 27 minutes (repeated many time both runs with nothing&lt;BR /&gt;else running). I presumed that this conversion will result in some speedup since the compiler&lt;BR /&gt;could convert these to more efficient loops. Is my thinking wrong? Is there a better way&lt;BR /&gt;to optimize the above loop?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Sammy</description>
      <pubDate>Mon, 17 Jan 2011 13:52:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752373#M8407</guid>
      <dc:creator>umar</dc:creator>
      <dc:date>2011-01-17T13:52:16Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752374#M8408</link>
      <description>Comparison of the vec-report outputs should help locate differences resulting from the change of syntax. In my experience, the most common performance problem would be the case where you have multiple assignments in a single DO loop which is vectorized effectively, but when you split the loop into f90 array syntax assignments, fusion (or lack of it) doesn't produce as efficient code. There are also bugs in recent compilers where DOT_PRODUCT and the like don't vectorize when they should.&lt;BR /&gt;In the loop you have quoted, my guess would be that you should write the inner loop as a rank 1 stride 1 assignment&lt;BR /&gt;pout(:,i,j,is) = pout(:,i,j,is) - dbmasq(:,i,j,2)*pswk(:,i,j,is)&lt;BR /&gt;in case the compiler hasn't sorted out on which subscript it can vectorize efficiently. I doubt that a conclusive answer could be given without at least the vec-report2 result.</description>
      <pubDate>Mon, 17 Jan 2011 14:57:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752374#M8408</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2011-01-17T14:57:43Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752375#M8409</link>
      <description>The best effect of array syntax you can possibly get is that itis as fast as an explicit loop. I have never seen it being faster but often slower.&lt;BR /&gt;&lt;BR /&gt;One side issue of array syntax arises when you pass non-contiguous array chunksto subroutines. It looks good in the source code butcauses the compiler to pass the data throughhidden temporary arrays with the associated copy overhead.&lt;BR /&gt;&lt;BR /&gt;So for critical code sections I tend to avoid array syntax whereas in uncritical sections it certainly improves readability.</description>
      <pubDate>Mon, 17 Jan 2011 16:01:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752375#M8409</guid>
      <dc:creator>mriedman</dc:creator>
      <dc:date>2011-01-17T16:01:09Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752376#M8410</link>
      <description>&lt;DIV id="tiny_quote"&gt;
                &lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=467307" class="basic" href="https://community.intel.com/en-us/profile/467307/"&gt;mriedman&lt;/A&gt;&lt;/DIV&gt;
                &lt;DIV style="background-color: #e5e5e5; padding: 5px; border: 1px inset; margin-left: 2px; margin-right: 2px;"&gt;&lt;I&gt;&lt;BR /&gt;One side issue of array syntax arises when you pass non-contiguous array chunksto subroutines. It looks good in the source code butcauses the compiler to pass the data throughhidden temporary arrays with the associated copy overhead.&lt;BR /&gt;&lt;BR /&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;/P&gt;That's a good point. ifort uses a temporary array not only for cases of non-contiguous data, but whenever it fails to resolve issues of possible overlap between right and left hand sides, even when most other Fortran vectorizing compilers find no difficulty. Even matching arrays on both sides might be a problem with the multiple rank assignment.&lt;BR /&gt;</description>
      <pubDate>Mon, 17 Jan 2011 17:48:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752376#M8410</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2011-01-17T17:48:16Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752377#M8411</link>
      <description>For what it's worth, our lab tried to do the same thing (remove explicit loops) and found the result to be much slower, even without the temporary arrays being made. This was true on all compilers we tested it on, except the older style Cray vector machines.&lt;BR /&gt;&lt;BR /&gt;Tim</description>
      <pubDate>Mon, 17 Jan 2011 20:29:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752377#M8411</guid>
      <dc:creator>Tim_Gallagher</dc:creator>
      <dc:date>2011-01-17T20:29:38Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752378#M8412</link>
      <description>Wouldn't it make sense than that compiler writers work on "fixing" this problem. Since Fortran&lt;BR /&gt;now has this syntax as standard people should not be forced to write loops where they can&lt;BR /&gt;perfectly use this shorthand. It should at least give the same timing compared to the looped&lt;BR /&gt;version if the original was perfectly written, otherwise there should be a speedup.&lt;BR /&gt;&lt;BR /&gt;I have used v77to90 from VAST to translate loops into inline syntax, so if this translation can&lt;BR /&gt;be done the reverse should be possible too. In my opinion this is a bug.&lt;BR /&gt;</description>
      <pubDate>Mon, 17 Jan 2011 21:46:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752378#M8412</guid>
      <dc:creator>umar</dc:creator>
      <dc:date>2011-01-17T21:46:23Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752379#M8413</link>
      <description>Many of us have spent a lot of time advocating improvements in performance of f90 array syntax. ifort has special directives which may be useful, and apply to array assignments as well as DO loops, such as&lt;BR /&gt;!dir$ vector aligned&lt;BR /&gt;(asserts the assignment begins with aligned data)&lt;BR /&gt;and&lt;BR /&gt;!dir$ distribute point&lt;BR /&gt;to prevent fusion of array assignments which involve incompatible alignments.&lt;BR /&gt;The usual problem is for the compiler to recognize opportunities for useful fusion of array assignments. You must set -O3 to tell it to attempt this, and set those distribute point directives where you want to prevent it.</description>
      <pubDate>Tue, 18 Jan 2011 00:51:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752379#M8413</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2011-01-18T00:51:51Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752380#M8414</link>
      <description>I am not an expert on alignment, but I do use -O3 -xHost when compiling. I am attaching one of my&lt;BR /&gt;subroutines which take the most amount of time. Where would you put those statements?&lt;BR /&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;subroutine hpsi(ncolx, ncoly, ncolz, upotq, bmassq, xkinpq, hsigmaq, cqq, dcqq, bmunuq,&amp;amp;&lt;BR /&gt; dbmuq, dbmasq, dxknpq, der1x, der1y, der1z, der2x, der2y, der2z, eshift,&amp;amp;&lt;BR /&gt; pinn, pout, pswk, itimrev)&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt;! hpsi: program to form the product of h on the&lt;BR /&gt;! state vector pinn and return the result in pout&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt; implicit none&lt;BR /&gt; integer, parameter :: wp = kind(1.0D0)&lt;BR /&gt;!-----------------------------------------------&lt;BR /&gt;! A r g u m e n t s&lt;BR /&gt;!-----------------------------------------------&lt;BR /&gt; integer , intent(in) :: ncolx&lt;BR /&gt; integer , intent(in) :: ncoly&lt;BR /&gt; integer , intent(in) :: ncolz&lt;BR /&gt; integer , intent(in) :: itimrev&lt;BR /&gt; real(wp) , intent(in) :: eshift&lt;BR /&gt; real(wp) , intent(in) :: upotq(ncolx,ncoly,ncolz)&lt;BR /&gt; real(wp) , intent(in) :: bmassq(ncolx,ncoly,ncolz)&lt;BR /&gt; real(wp) , intent(in) :: xkinpq(ncolx,ncoly,ncolz,3)&lt;BR /&gt; real(wp) , intent(in) :: hsigmaq(ncolx,ncoly,ncolz,3)&lt;BR /&gt; real(wp) , intent(in) :: cqq(ncolx,ncoly,ncolz,3)&lt;BR /&gt; real(wp) , intent(in) :: dcqq(ncolx,ncoly,ncolz,3,3)&lt;BR /&gt; real(wp) , intent(in) :: bmunuq(ncolx,ncoly,ncolz,3,3)&lt;BR /&gt; real(wp) , intent(in) :: dbmuq(ncolx,ncoly,ncolz,3)&lt;BR /&gt; real(wp) , intent(in) :: dbmasq(ncolx,ncoly,ncolz,3)&lt;BR /&gt; real(wp) , intent(in) :: dxknpq(ncolx,ncoly,ncolz)&lt;BR /&gt; real(wp) , intent(in) :: der1x(ncolx,ncolx)&lt;BR /&gt; real(wp) , intent(in) :: der1y(ncoly,ncoly)&lt;BR /&gt; real(wp) , intent(in) :: der1z(ncolz,ncolz)&lt;BR /&gt; real(wp) , intent(in) :: der2x(ncolx,ncolx)&lt;BR /&gt; real(wp) , intent(in) :: der2y(ncoly,ncoly)&lt;BR /&gt; real(wp) , intent(in) :: der2z(ncolz,ncolz)&lt;BR /&gt; complex(wp) , intent(inout) :: pout(ncolx,ncoly,ncolz,2)&lt;BR /&gt; complex(wp) :: pinn(ncolx,ncoly,ncolz,2)&lt;BR /&gt; complex(wp) :: pswk(ncolx,ncoly,ncolz,2)&lt;BR /&gt;!-----------------------------------------------&lt;BR /&gt;! L o c a l P a r a m e t e r s&lt;BR /&gt;!-----------------------------------------------&lt;BR /&gt; complex(wp), parameter :: eye = (0.0_wp,1.0_wp)&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt;! diagonal part of h times psi (U_q part)&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt; pout(:,:,:,1) = (upotq(:,:,:) - eshift)*pinn(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = (upotq(:,:,:) - eshift)*pinn(:,:,:,2)&lt;BR /&gt;! diagonal part of I_q term ( -i/2[Del.I_q]psi ) (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - eye/2.0_wp*dxknpq(:,:,:)*pinn(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - eye/2.0_wp*dxknpq(:,:,:)*pinn(:,:,:,2)&lt;BR /&gt;! this is the big sigma_q term (time-odd)&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) + (hsigmaq(:,:,:,1)*pinn(:,:,:,2) - &amp;amp;&lt;BR /&gt; eye*hsigmaq(:,:,:,2)*pinn(:,:,:,2) + &amp;amp;&lt;BR /&gt; hsigmaq(:,:,:,3)*pinn(:,:,:,1))&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) + (hsigmaq(:,:,:,1)*pinn(:,:,:,1) + &amp;amp;&lt;BR /&gt; eye*hsigmaq(:,:,:,2)*pinn(:,:,:,1) - &amp;amp;&lt;BR /&gt; hsigmaq(:,:,:,3)*pinn(:,:,:,2))&lt;BR /&gt; endif&lt;BR /&gt;! this is the -i/2[Del.B_q]psi (tensor) part of the h*psi&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - eye/2.0_wp*(dbmuq(:,:,:,1)*pinn(:,:,:,2) - &amp;amp;&lt;BR /&gt; eye*dbmuq(:,:,:,2)*pinn(:,:,:,2) + &amp;amp;&lt;BR /&gt; dbmuq(:,:,:,3)*pinn(:,:,:,1))&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - eye/2.0_wp*(dbmuq(:,:,:,1)*pinn(:,:,:,1) + &amp;amp;&lt;BR /&gt; eye*dbmuq(:,:,:,2)*pinn(:,:,:,1) - &amp;amp;&lt;BR /&gt; dbmuq(:,:,:,3)*pinn(:,:,:,2))&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt;! terms which require d/dx psi&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt; call cmulx (ncolx, ncoly, ncolz, der1x, pinn, pswk, 0)&lt;BR /&gt;! part of the effective mass term&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - dbmasq(:,:,:,1)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - dbmasq(:,:,:,1)*pswk(:,:,:,2)&lt;BR /&gt;! the other part of the I_q term ( -eye*I_q*Dpsi term ) (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - eye*xkinpq(:,:,:,1)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - eye*xkinpq(:,:,:,1)*pswk(:,:,:,2)&lt;BR /&gt; endif&lt;BR /&gt;! this is the -eye*B_q.Dsigma (tensor) part of h*psi&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - eye*(bmunuq(:,:,:,1,1)*pswk(:,:,:,2) - &amp;amp;&lt;BR /&gt; eye*bmunuq(:,:,:,1,2)*pswk(:,:,:,2) + &amp;amp;&lt;BR /&gt; bmunuq(:,:,:,1,3)*pswk(:,:,:,1))&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - eye*(bmunuq(:,:,:,1,1)*pswk(:,:,:,1) + &amp;amp;&lt;BR /&gt; eye*bmunuq(:,:,:,1,2)*pswk(:,:,:,1) - &amp;amp;&lt;BR /&gt; bmunuq(:,:,:,1,3)*pswk(:,:,:,2))&lt;BR /&gt;! this is the [Del.sigma.C_q]Del psi part of the C_q term (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) + (dcqq(:,:,:,1,1) - eye*dcqq(:,:,:,1,2))*pswk(:,:,:,2) + &amp;amp;&lt;BR /&gt; dcqq(:,:,:,1,3)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) + (dcqq(:,:,:,1,1) + eye*dcqq(:,:,:,1,2))*pswk(:,:,:,1) - &amp;amp;&lt;BR /&gt; dcqq(:,:,:,1,3)*pswk(:,:,:,2)&lt;BR /&gt; endif&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt;! terms which require d/dy psi&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt; call cmuly (ncolx, ncoly, ncolz, der1y, pinn, pswk, 0)&lt;BR /&gt;! part of the effective mass term&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - dbmasq(:,:,:,2)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - dbmasq(:,:,:,2)*pswk(:,:,:,2)&lt;BR /&gt;! part of -eye*I_q*Dpsi term (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - eye*xkinpq(:,:,:,2)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - eye*xkinpq(:,:,:,2)*pswk(:,:,:,2)&lt;BR /&gt; endif&lt;BR /&gt;! this is the -eye*B_q.Dsigma (tensor) part of h*psi&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - eye*(bmunuq(:,:,:,2,1)*pswk(:,:,:,2) - &amp;amp;&lt;BR /&gt; eye*bmunuq(:,:,:,2,2)*pswk(:,:,:,2) + &amp;amp;&lt;BR /&gt; bmunuq(:,:,:,2,3)*pswk(:,:,:,1))&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - eye*(bmunuq(:,:,:,2,1)*pswk(:,:,:,1) + &amp;amp;&lt;BR /&gt; eye*bmunuq(:,:,:,2,2)*pswk(:,:,:,1) - &amp;amp;&lt;BR /&gt; bmunuq(:,:,:,2,3)*pswk(:,:,:,2))&lt;BR /&gt;! this is the [Del.sigma.C_q]Del psi part of the C_q term (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) + (dcqq(:,:,:,2,1) - eye*dcqq(:,:,:,2,2))*pswk(:,:,:,2) + &amp;amp;&lt;BR /&gt; dcqq(:,:,:,2,3)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) + (dcqq(:,:,:,2,1) + eye*dcqq(:,:,:,2,2))*pswk(:,:,:,1) - &amp;amp;&lt;BR /&gt; dcqq(:,:,:,2,3)*pswk(:,:,:,2)&lt;BR /&gt; endif&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt;! terms which require d/dz psi&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt; call cmulz (ncolx, ncoly, ncolz, der1z, pinn, pswk, 0)&lt;BR /&gt;! part of the effective mass term&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - dbmasq(:,:,:,3)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - dbmasq(:,:,:,3)*pswk(:,:,:,2)&lt;BR /&gt;! part of the -eye*I_q*Dpsi term (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - eye*xkinpq(:,:,:,3)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - eye*xkinpq(:,:,:,3)*pswk(:,:,:,2)&lt;BR /&gt; endif&lt;BR /&gt;! this is the -eye*B_q.Dsigma (tensor) part of h*psi&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - eye*(bmunuq(:,:,:,3,1)*pswk(:,:,:,2) - &amp;amp;&lt;BR /&gt; eye*bmunuq(:,:,:,3,2)*pswk(:,:,:,2) + &amp;amp;&lt;BR /&gt; bmunuq(:,:,:,3,3)*pswk(:,:,:,1))&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - eye*(bmunuq(:,:,:,3,1)*pswk(:,:,:,1) + &amp;amp;&lt;BR /&gt; eye*bmunuq(:,:,:,3,2)*pswk(:,:,:,1) - &amp;amp;&lt;BR /&gt; bmunuq(:,:,:,3,3)*pswk(:,:,:,2))&lt;BR /&gt;! this is the [Del.sigma.C_q]Del psi part of the C_q term (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) + (dcqq(:,:,:,3,1) - eye*dcqq(:,:,:,3,2))*pswk(:,:,:,2) + &amp;amp;&lt;BR /&gt; dcqq(:,:,:,3,3)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) + (dcqq(:,:,:,3,1) + eye*dcqq(:,:,:,3,2))*pswk(:,:,:,1) - &amp;amp;&lt;BR /&gt; dcqq(:,:,:,3,3)*pswk(:,:,:,2)&lt;BR /&gt; endif&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt;! terms which require (d/dx)**2 psi&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt; call cmulx (ncolx, ncoly, ncolz, der2x, pinn, pswk, 0)&lt;BR /&gt;! Effective mass, kinetic energy term&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - bmassq(:,:,:)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - bmassq(:,:,:)*pswk(:,:,:,2)&lt;BR /&gt;! This is the second part of C_q term (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) + (cqq(:,:,:,1) - eye*cqq(:,:,:,2))*pswk(:,:,:,2) + &amp;amp;&lt;BR /&gt; cqq(:,:,:,3)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) + (cqq(:,:,:,1) + eye*cqq(:,:,:,2))*pswk(:,:,:,1) - &amp;amp;&lt;BR /&gt; cqq(:,:,:,3)*pswk(:,:,:,2)&lt;BR /&gt; endif&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt;! terms which require (d/dy)**2 psi&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt; call cmuly (ncolx, ncoly, ncolz, der2y, pinn, pswk, 0)&lt;BR /&gt;! Effective mass, kinetic energy term&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - bmassq(:,:,:)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - bmassq(:,:,:)*pswk(:,:,:,2)&lt;BR /&gt;! This is the second part of C_q term (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) + (cqq(:,:,:,1) - eye*cqq(:,:,:,2))*pswk(:,:,:,2) + &amp;amp;&lt;BR /&gt; cqq(:,:,:,3)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) + (cqq(:,:,:,1) + eye*cqq(:,:,:,2))*pswk(:,:,:,1) - &amp;amp;&lt;BR /&gt; cqq(:,:,:,3)*pswk(:,:,:,2)&lt;BR /&gt; endif&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt;! terms which require (d/dz)**2 psi&lt;BR /&gt;!-----------------------------------------------------------------------&lt;BR /&gt; call cmulz (ncolx, ncoly, ncolz, der2z, pinn, pswk, 0)&lt;BR /&gt;! Effective mass, kinetic energy term&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) - bmassq(:,:,:)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) - bmassq(:,:,:)*pswk(:,:,:,2)&lt;BR /&gt;! This is the second part of C_q term (time-odd)&lt;BR /&gt; if(itimrev == 0) then&lt;BR /&gt; pout(:,:,:,1) = pout(:,:,:,1) + (cqq(:,:,:,1) - eye*cqq(:,:,:,2))*pswk(:,:,:,2) + &amp;amp;&lt;BR /&gt; cqq(:,:,:,3)*pswk(:,:,:,1)&lt;BR /&gt; pout(:,:,:,2) = pout(:,:,:,2) + (cqq(:,:,:,1) + eye*cqq(:,:,:,2))*pswk(:,:,:,1) - &amp;amp;&lt;BR /&gt; cqq(:,:,:,3)*pswk(:,:,:,2)&lt;BR /&gt; endif&lt;BR /&gt;!&lt;BR /&gt; return&lt;BR /&gt; end subroutine hpsi&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 18 Jan 2011 14:10:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752380#M8414</guid>
      <dc:creator>umar</dc:creator>
      <dc:date>2011-01-18T14:10:34Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752381#M8415</link>
      <description>First, you should get the vec-report before modification.&lt;BR /&gt;Comparing after modification, you would see whether the directives accomplished anything.&lt;BR /&gt;&lt;BR /&gt;Then, for each array assignment where you know all the arrays are 16-byte aligned, you could add a preceding&lt;BR /&gt;!dir$ vector aligned&lt;BR /&gt;It looks like this could work only if ncolx, ncoly, ncolz are all multiples of 2.&lt;BR /&gt;&lt;BR /&gt;If you would replace /2.0_wp by *0.5_wp, you would avoid a dependence on the option -no-prec-div (which apparently you are using, by default). I don't know why this transformation is forbidden by the preferable option -prec-div.</description>
      <pubDate>Tue, 18 Jan 2011 16:26:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752381#M8415</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2011-01-18T16:26:58Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752382#M8416</link>
      <description>After examining the vectorization reports I have come to these conclusions:&lt;BR /&gt;&lt;BR /&gt;1. Expressions involving complex arrays do not vectorize (most of this routine).&lt;BR /&gt;2. The simple speed gain when things were explicit loops (not array inline syntax) must&lt;BR /&gt; be coming from loop unrolling or things of that nature.&lt;BR /&gt;3. If I do the painstaking job of breaking real and imaginary parts up, things do vectorize.&lt;BR /&gt;&lt;BR /&gt;I will try to do (3) and report the speedup.</description>
      <pubDate>Thu, 20 Jan 2011 17:30:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752382#M8416</guid>
      <dc:creator>umar</dc:creator>
      <dc:date>2011-01-20T17:30:35Z</dc:date>
    </item>
    <item>
      <title>Inline array syntax speedup or slowdown?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752383#M8417</link>
      <description>Vectorization of complex requires, at a minimum, -msse3. That should be inherent in -xhost. I could believe that multi-rank operations with complex might not optimize.&lt;BR /&gt;&lt;BR /&gt;I've never seen a difference in default unroll between array syntax and DO loops. Maybe you mean with the usage of multi-rank assignments, where I could believe your finding.&lt;BR /&gt;Sometimes, DO loops will take e.g. !dir$ unroll(4) to specify amount of unroll.&lt;BR /&gt;&lt;BR /&gt;You would have to compare your asm code to see what the differences are, if it's not simply a question of vectorization and distribution.</description>
      <pubDate>Thu, 20 Jan 2011 17:51:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Inline-array-syntax-speedup-or-slowdown/m-p/752383#M8417</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2011-01-20T17:51:06Z</dc:date>
    </item>
  </channel>
</rss>

