<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Dear Fiona, thank you for in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089238#M23133</link>
    <description>&lt;P&gt;Dear Fiona, thank you for your answer.&lt;/P&gt;

&lt;P&gt;The code I'm implementing is the following:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;program MAIN
        
    implicit none
    
    integer, parameter :: Na = 250, Nx = 100, NS = Na*Nx    
    
    integer            :: rA, cA, rB, cB, icA, icB, ic
    real(8) 		:: t_in, t_out
  
    real(8)            :: A(NS,Na), B(NS,Nx), KK(NS,NS)
    
    A = 1.0D0
    B = 2.0D0 
    
    rA = size(A,1)
    cA = size(A,2)
    rB = size(B,1)
    cB = size(B,2)    

    KK = 0.0D0

    call cpu_time(t_in)    
    do icA = 1, cA        
        do icB = 1, cB            
            ic = (icA - 1)*cB + icB       
            KK(:,ic) = A(:,icA)*B(:,icB)
        end do        
    end do
    call cpu_time(t_out)
    print *, '("Computing KK without vdmul takes = ",f2.6," seconds.")', t_out - t_in 

   call cpu_time(t_in)
   do icA = 1, cA
        do icB = 1, cB
            ic = (icA - 1)*cB + icB
            call vdmul( rA, A(:,icA), B(:,icB), KK(:,ic) )                                                                                                                    
        end do
    end do
    call cpu_time(t_out)
    print *, '("Computing KK with vdmul takes = ",f2.6," seconds.")', t_out - t_in
   
end program MAIN&lt;/PRE&gt;

&lt;P&gt;The output I'm getting is as follows:&amp;nbsp;&lt;/P&gt;

&lt;P&gt;
	&lt;STYLE type="text/css"&gt;p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo}
span.s1 {font-variant-ligatures: no-common-ligatures}
	&lt;/STYLE&gt;
&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;&amp;nbsp;("Computing KK without vdmul takes = ",f2.6," seconds.")&lt;/SPAN&gt;&lt;SPAN style="font-variant-ligatures: no-common-ligatures;"&gt;&amp;nbsp; &amp;nbsp;1.02616200000000 &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;&amp;nbsp;("Computing KK with vdmul takes = ",f2.6," seconds.") &amp;nbsp; &amp;nbsp; &amp;nbsp;1.16865000000000&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="font-size: 13.008px;"&gt;I have tried varied sizes for Na and Nx: usually there is no significant difference using vdmul; if at all, it's slowlier in the latter case. The only thing that makes a significant difference for the speed of the code is whether the following line is commented or not:&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:fortran;" style="font-size: 13.008px;"&gt;    KK = 0.0D0
&lt;/PRE&gt;

&lt;P style="font-size: 13.008px;"&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I'm using OS X 10.11.6.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Thank you for your help, and best regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px;"&gt;Axelle&lt;/P&gt;</description>
    <pubDate>Tue, 07 Feb 2017 13:11:19 GMT</pubDate>
    <dc:creator>Ferriere__Axelle</dc:creator>
    <dc:date>2017-02-07T13:11:19Z</dc:date>
    <item>
      <title>Question using vdmul</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089236#M23131</link>
      <description>&lt;P&gt;On my MAC with Intel, using vdmul does not make the multiplication of two matrices faster than the usual A*B multiplication.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Is this normal? Or does it mean that maybe I did not install MKL properly?&lt;/P&gt;

&lt;P&gt;Thank you very much in advance for your help&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Feb 2017 15:55:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089236#M23131</guid>
      <dc:creator>Ferriere__Axelle</dc:creator>
      <dc:date>2017-02-06T15:55:00Z</dc:date>
    </item>
    <item>
      <title>Dear customer,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089237#M23132</link>
      <description>&lt;P&gt;Dear customer,&lt;/P&gt;

&lt;P&gt;The performance depends on the size of your calculating vectors and parallelization of your program and even your system workaround(CPU, OS, compiler...)Could you please provide more information and a test sample to show how you implement and how you compared with. So that I could help you to have a check. Thanks.&lt;/P&gt;

&lt;P&gt;Best regards,&lt;BR /&gt;
	Fiona&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2017 01:54:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089237#M23132</guid>
      <dc:creator>Zhen_Z_Intel</dc:creator>
      <dc:date>2017-02-07T01:54:40Z</dc:date>
    </item>
    <item>
      <title>Dear Fiona, thank you for</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089238#M23133</link>
      <description>&lt;P&gt;Dear Fiona, thank you for your answer.&lt;/P&gt;

&lt;P&gt;The code I'm implementing is the following:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;program MAIN
        
    implicit none
    
    integer, parameter :: Na = 250, Nx = 100, NS = Na*Nx    
    
    integer            :: rA, cA, rB, cB, icA, icB, ic
    real(8) 		:: t_in, t_out
  
    real(8)            :: A(NS,Na), B(NS,Nx), KK(NS,NS)
    
    A = 1.0D0
    B = 2.0D0 
    
    rA = size(A,1)
    cA = size(A,2)
    rB = size(B,1)
    cB = size(B,2)    

    KK = 0.0D0

    call cpu_time(t_in)    
    do icA = 1, cA        
        do icB = 1, cB            
            ic = (icA - 1)*cB + icB       
            KK(:,ic) = A(:,icA)*B(:,icB)
        end do        
    end do
    call cpu_time(t_out)
    print *, '("Computing KK without vdmul takes = ",f2.6," seconds.")', t_out - t_in 

   call cpu_time(t_in)
   do icA = 1, cA
        do icB = 1, cB
            ic = (icA - 1)*cB + icB
            call vdmul( rA, A(:,icA), B(:,icB), KK(:,ic) )                                                                                                                    
        end do
    end do
    call cpu_time(t_out)
    print *, '("Computing KK with vdmul takes = ",f2.6," seconds.")', t_out - t_in
   
end program MAIN&lt;/PRE&gt;

&lt;P&gt;The output I'm getting is as follows:&amp;nbsp;&lt;/P&gt;

&lt;P&gt;
	&lt;STYLE type="text/css"&gt;p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo}
span.s1 {font-variant-ligatures: no-common-ligatures}
	&lt;/STYLE&gt;
&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;&amp;nbsp;("Computing KK without vdmul takes = ",f2.6," seconds.")&lt;/SPAN&gt;&lt;SPAN style="font-variant-ligatures: no-common-ligatures;"&gt;&amp;nbsp; &amp;nbsp;1.02616200000000 &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;&amp;nbsp;("Computing KK with vdmul takes = ",f2.6," seconds.") &amp;nbsp; &amp;nbsp; &amp;nbsp;1.16865000000000&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="font-size: 13.008px;"&gt;I have tried varied sizes for Na and Nx: usually there is no significant difference using vdmul; if at all, it's slowlier in the latter case. The only thing that makes a significant difference for the speed of the code is whether the following line is commented or not:&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:fortran;" style="font-size: 13.008px;"&gt;    KK = 0.0D0
&lt;/PRE&gt;

&lt;P style="font-size: 13.008px;"&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I'm using OS X 10.11.6.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Thank you for your help, and best regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px;"&gt;Axelle&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2017 13:11:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089238#M23133</guid>
      <dc:creator>Ferriere__Axelle</dc:creator>
      <dc:date>2017-02-07T13:11:19Z</dc:date>
    </item>
    <item>
      <title>Auto vectorization of your</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089239#M23134</link>
      <description>Auto vectorization of your source   code ought to achieve full performance.  If there is further performance to be gained by threading, it may be done better outside the inner loop.
Anyway threading would at best reduce elapsed but not cpu time so if vdmul does that you may want system_clock</description>
      <pubDate>Tue, 07 Feb 2017 15:37:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089239#M23134</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2017-02-07T15:37:00Z</dc:date>
    </item>
    <item>
      <title>Dear Tim,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089240#M23135</link>
      <description>&lt;P&gt;Dear Tim,&lt;/P&gt;

&lt;P&gt;I see, thanks a lot for your quick reply!&lt;/P&gt;

&lt;P&gt;Best,&lt;/P&gt;

&lt;P&gt;Axelle&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2017 16:38:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Question-using-vdmul/m-p/1089240#M23135</guid>
      <dc:creator>Ferriere__Axelle</dc:creator>
      <dc:date>2017-02-07T16:38:31Z</dc:date>
    </item>
  </channel>
</rss>

