<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Computation time in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901455#M80858</link>
    <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
In release mode the compiler's optimiser should work out that test_2 effectively does nothing - the value assigned to 'c' isn't used, so it won't bother with that calculation. I suspect it would eliminate most of the code associated with that routine. &lt;BR /&gt;&lt;BR /&gt;By the way, your subscript triplets ("i:(n-i1)*n +i:n" etc) look problematic. When you run your tests under debug mode with "Check array and string bounds" on (/check:bounds on the command line), what happens?&lt;BR /&gt;</description>
    <pubDate>Wed, 19 Aug 2009 05:56:01 GMT</pubDate>
    <dc:creator>IanH</dc:creator>
    <dc:date>2009-08-19T05:56:01Z</dc:date>
    <item>
      <title>Computation time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901452#M80855</link>
      <description>Dear All,&lt;BR /&gt;&lt;BR /&gt;I wrote two "identical" fortran routines : test_1 and test_2. Only line 29 is diffrent&lt;BR /&gt;&lt;BR /&gt;test_1 line 29 is : a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))&lt;BR /&gt;&lt;BR /&gt;test_2 line 29 is :     c = sum(v(1:n)*b((j-1)*n +1:j*n))&lt;BR /&gt;&lt;BR /&gt;But computation time is very different. Here are CPU time in secondes for different values of n.&lt;BR /&gt;&lt;BR /&gt; test_1 time     test_2 time&lt;BR /&gt;&lt;BR /&gt;n =  500    	0.905   		0.016&lt;BR /&gt;n = 1000    	7.207   		0.016&lt;BR /&gt;n = 1500   	24.523   		0.047&lt;BR /&gt;n = 2000   	58.641   		0.109&lt;BR /&gt;&lt;BR /&gt;I need to keep the array a. How can I change test_1 to make it as faster as test_2?&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;&lt;BR /&gt;Didace &lt;BR /&gt;&lt;BR /&gt;Ps : see source code bellow&lt;BR /&gt;&lt;BR /&gt;----------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;subroutine test_1 (a,b,n) &lt;BR /&gt;&lt;BR /&gt;use const_m&lt;BR /&gt;&lt;BR /&gt;implicit none&lt;BR /&gt;&lt;BR /&gt;integer, intent(in) :: n&lt;BR /&gt;&lt;BR /&gt;integer				:: i, j, p&lt;BR /&gt;&lt;BR /&gt;complex*16 :: c&lt;BR /&gt;complex*16, dimension(n), intent(inout) :: a&lt;BR /&gt;complex*16, dimension(n), intent(in   ) :: b&lt;BR /&gt;&lt;BR /&gt;complex*16, dimension(:), allocatable   :: v&lt;BR /&gt;&lt;BR /&gt;allocate(v(n))&lt;BR /&gt;&lt;BR /&gt;do i=1,n&lt;BR /&gt;&lt;BR /&gt; v(:) = a(i:(n-i1)*n +i:n)&lt;BR /&gt;&lt;BR /&gt; p = i -n&lt;BR /&gt;&lt;BR /&gt; do j=1,n&lt;BR /&gt;&lt;BR /&gt; p = p +n&lt;BR /&gt;&lt;BR /&gt; a(i1) = sum(v(1:n)*b((j-1)*n +1:j*n))&lt;BR /&gt;&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt;enddo&lt;BR /&gt;&lt;BR /&gt;deallocate(v)&lt;BR /&gt;&lt;BR /&gt;end subroutine test_1&lt;BR /&gt;&lt;BR /&gt;----------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;subroutine test_2 (a,b,n) &lt;BR /&gt;&lt;BR /&gt;use const_m&lt;BR /&gt;&lt;BR /&gt;implicit none&lt;BR /&gt;&lt;BR /&gt;integer, intent(in) :: n&lt;BR /&gt;&lt;BR /&gt;integer				:: i, j, p&lt;BR /&gt;&lt;BR /&gt;complex*16 :: c&lt;BR /&gt;complex*16, dimension(n), intent(inout) :: a&lt;BR /&gt;complex*16, dimension(n), intent(in   ) :: b&lt;BR /&gt;&lt;BR /&gt;complex*16, dimension(:), allocatable   :: v&lt;BR /&gt;&lt;BR /&gt;allocate(v(n))&lt;BR /&gt;&lt;BR /&gt;do i=1,n&lt;BR /&gt;&lt;BR /&gt; v(:) = a(i:(n-i1)*n +i:n)&lt;BR /&gt;&lt;BR /&gt; p = i -n&lt;BR /&gt;&lt;BR /&gt; do j=1,n&lt;BR /&gt;&lt;BR /&gt; p = p +n&lt;BR /&gt;&lt;BR /&gt; c = sum(v(1:n)*b((j-1)*n +1:j*n))&lt;BR /&gt;&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt;enddo&lt;BR /&gt;&lt;BR /&gt;deallocate(v)&lt;BR /&gt;&lt;BR /&gt;end subroutine test_2&lt;BR /&gt;&lt;BR /&gt;-----------------------------------------------------------</description>
      <pubDate>Thu, 06 Aug 2009 12:36:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901452#M80855</guid>
      <dc:creator>ekeom</dc:creator>
      <dc:date>2009-08-06T12:36:56Z</dc:date>
    </item>
    <item>
      <title>Re: Computation time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901453#M80856</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;For the test, did you compile in debug or release mode? In release mode the compiler will vectorise the loop and optimise the memory-handling for the array a(i), but in debug mode this doesn't happen.&lt;BR /&gt;&lt;BR /&gt;Stephen.</description>
      <pubDate>Thu, 06 Aug 2009 19:29:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901453#M80856</guid>
      <dc:creator>eos_pengwern</dc:creator>
      <dc:date>2009-08-06T19:29:13Z</dc:date>
    </item>
    <item>
      <title>Re: Computation time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901454#M80857</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/283612"&gt;eos pengwern&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;For the test, did you compile in debug or release mode? In release mode the compiler will vectorise the loop and optimise the memory-handling for the array a(i), but in debug mode this doesn't happen.&lt;BR /&gt;&lt;BR /&gt;Stephen.&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
Thank you Stephen,&lt;BR /&gt;&lt;BR /&gt;For your answer. I have used the rease mode.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;&lt;BR /&gt;Didace&lt;BR /&gt;</description>
      <pubDate>Wed, 19 Aug 2009 04:27:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901454#M80857</guid>
      <dc:creator>ekeom</dc:creator>
      <dc:date>2009-08-19T04:27:51Z</dc:date>
    </item>
    <item>
      <title>Re: Computation time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901455#M80858</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
In release mode the compiler's optimiser should work out that test_2 effectively does nothing - the value assigned to 'c' isn't used, so it won't bother with that calculation. I suspect it would eliminate most of the code associated with that routine. &lt;BR /&gt;&lt;BR /&gt;By the way, your subscript triplets ("i:(n-i1)*n +i:n" etc) look problematic. When you run your tests under debug mode with "Check array and string bounds" on (/check:bounds on the command line), what happens?&lt;BR /&gt;</description>
      <pubDate>Wed, 19 Aug 2009 05:56:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901455#M80858</guid>
      <dc:creator>IanH</dc:creator>
      <dc:date>2009-08-19T05:56:01Z</dc:date>
    </item>
    <item>
      <title>Re: Computation time</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901456#M80859</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/212570"&gt;IanH&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; In release mode the compiler's optimiser should work out that test_2 effectively does nothing - the value assigned to 'c' isn't used, so it won't bother with that calculation. I suspect it would eliminate most of the code associated with that routine. &lt;BR /&gt;&lt;BR /&gt;By the way, your subscript triplets ("i:(n-i1)*n +i:n" etc) look problematic. When you run your tests under debug mode with "Check array and string bounds" on (/check:bounds on the command line), what happens?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Ian is rigth. &lt;BR /&gt;&lt;BR /&gt;(Note: I assume i1 is not changing)&lt;BR /&gt;&lt;BR /&gt;That what you tested/showed is called Loop Invariant Motion. Store instructions are removed in test 2. I attach (copy of) an example with explanation why it is slow.&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;A pointer variable is used inside the loop. The target value changes but the value of the pointer itself does not change inside the loop. Using a loop invariant pointer results in the execution of redundant memory load and store operations.&lt;BR /&gt;&lt;/EM&gt;Note: In Fortran, pointers are used to reference dummy arguments.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;subroutine xmpl03(a,n,b)&lt;BR /&gt;integer n&lt;BR /&gt;integer a(n),b&lt;BR /&gt;integer lim&lt;BR /&gt;&lt;BR /&gt;lim = n&lt;BR /&gt;&lt;BR /&gt;do 10 i=1,lim&lt;BR /&gt;&lt;BR /&gt;a(1)=a(1)+b&lt;BR /&gt;&lt;BR /&gt;/* An array variable a(1) whose index does not change is used for computation inside the loop. Both a and b are &lt;BR /&gt;dummy arguments which are referenced indirectly using pointers. &lt;STRONG&gt;Redundant stores are executed for the loop invariant array variable&lt;/STRONG&gt;. */&lt;BR /&gt;&lt;BR /&gt;10 continue&lt;BR /&gt;end &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;A.&lt;BR /&gt;</description>
      <pubDate>Wed, 19 Aug 2009 06:06:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Computation-time/m-p/901456#M80859</guid>
      <dc:creator>ArturGuzik</dc:creator>
      <dc:date>2009-08-19T06:06:11Z</dc:date>
    </item>
  </channel>
</rss>

