<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic This is because round-off in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/inconsistency-between-mkl-and-matmul/m-p/1060420#M21671</link>
    <description>&lt;P&gt;This is because round-off errors in floating point computation accumulate differently in different implementations. It's not uncommon to see different results of the same algorithm when using different math libraries. Even with the same library, running on different hardware, or the same hardware but different number of threads, can lead to different results because the algorithm may be parallelized in different ways, hence the completion order of the operation sequences may be different.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 27 Jan 2015 22:43:38 GMT</pubDate>
    <dc:creator>Zhang_Z_Intel</dc:creator>
    <dc:date>2015-01-27T22:43:38Z</dc:date>
    <item>
      <title>inconsistency between mkl and matmul</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/inconsistency-between-mkl-and-matmul/m-p/1060419#M21670</link>
      <description>&lt;P&gt;We have compared the results for matrix-vector recurrence relations of the type v_i = A*v_(i-1), as calculated by matmul and mkl routines.&lt;/P&gt;

&lt;P&gt;After some iterations the calculations seem to diverge exponentially in most of the cases, but the outcome is machine dependent.&lt;/P&gt;

&lt;P&gt;We have make comparisons for dgemv, dgemm, zgemm, and dzgemv. The problem occurs more frequently for the transpose multiplication case.&lt;/P&gt;

&lt;P&gt;The tests were made on the following code.&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;program test_mkl

&amp;nbsp;implicit none

&amp;nbsp;real*8, allocatable :: v(:,:), vv(:,:)
&amp;nbsp;real*8, parameter&amp;nbsp;&amp;nbsp; :: zero = 0.d0, &amp;amp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; one&amp;nbsp; = 1.d0, &amp;amp;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; two&amp;nbsp; = 2.d0
&amp;nbsp;real*8,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; allocatable :: A(:,:)
&amp;nbsp;integer&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :: n, order, i
&amp;nbsp;integer, dimension(4)&amp;nbsp;&amp;nbsp; :: seed = (/0,0,0,1/)

&amp;nbsp;n = 3000
&amp;nbsp;order = 10

&amp;nbsp;allocate( v(n,order) , vv(n,order) )
&amp;nbsp;allocate( A(n,n) )

&amp;nbsp;! generate random numbers in the interval [0,,1]
&amp;nbsp;call dlarnv( 1, seed, n*n, A )
&amp;nbsp;call dlarnv( 1, seed, n*order, vv )
&amp;nbsp;v = vv

&amp;nbsp;write(*,*) "diff increases exponentially for conjugate multiplication"
&amp;nbsp;do i = 2, order

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vv(:,i) = matmul( vv(:,i-1), A )

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call dgemv( 'T', n, n, one, A, n, vv(:,i-1), 1, zero, v(:,i), 1 )
&amp;nbsp;!&amp;nbsp;&amp;nbsp;&amp;nbsp; call dzgemv( 'T', n, n, z_two, A, n, vv(:,i-1), 1, z_zero, v(:), 1 )
&amp;nbsp;!&amp;nbsp;&amp;nbsp;&amp;nbsp; call dzgemm( 'T', 'N', n, 1, n, z_two, A, n, vv(:,i-1), n, z_zero, v(:), n )
&amp;nbsp;!&amp;nbsp;&amp;nbsp;&amp;nbsp; call zgemm( 'T', 'N', n, 1, n, z_two, A, n, v(:,i-1), n, z_zero, v(:,i), n )
&amp;nbsp;!&amp;nbsp;&amp;nbsp;&amp;nbsp; call dgemm( 'T', 'N', n, 1, n, two, A, n, v(:,i-1), n, zero, v(:,i), n )

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; write(*,'(a,i2)'), "i = ", i
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; write(*,*) "diff = ", maxval(abs( v(:,i)-vv(:,i) ))
&amp;nbsp;end do

&amp;nbsp;write(*,*) " "
&amp;nbsp;write(*,*) "diff for direct multiplication"
&amp;nbsp;do i = 2, order

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vv(:,i) = matmul( A , vv(:,i-1) )

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call dgemv( 'N', n, n, one, A, n, vv(:,i-1), 1, zero, v(:,i), 1 )
&amp;nbsp;!&amp;nbsp;&amp;nbsp;&amp;nbsp; call dzgemv( 'N', n, n, z_two, A, n, vv(:,i-1), 1, z_zero, v(:), 1 )
&amp;nbsp;!&amp;nbsp;&amp;nbsp;&amp;nbsp; call dzgemm( 'N', 'N', n, 1, n, z_two, A, n, vv(:,i-1), n, z_zero, v(:), n )
&amp;nbsp;!&amp;nbsp;&amp;nbsp;&amp;nbsp; call zgemm( 'N', 'N', n, 1, n, z_two, A, n, v(:,i-1), n, z_zero, v(:,i), n )
&amp;nbsp;!&amp;nbsp;&amp;nbsp;&amp;nbsp; call dgemm( 'N', 'N', n, 1, n, two, A, n, v(:,i-1), n, zero, v(:,i), n )

&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; write(*,'(a,i2)'), "i = ", i
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; write(*,*) "diff = ", maxval(abs( v(:,i)-vv(:,i) ))
&amp;nbsp;end do

&amp;nbsp;end program&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Below, there is a typical output for a XEON E5-2687W. What may be causing this type of problem?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;diff increases exponentially for conjugate multiplication&lt;BR /&gt;
	i =&amp;nbsp; 2&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;2.955857780762017E-012&lt;BR /&gt;
	i =&amp;nbsp; 3&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;4.190951585769653E-009&lt;BR /&gt;
	i =&amp;nbsp; 4&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;5.722045898437500E-006&lt;BR /&gt;
	i =&amp;nbsp; 5&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;8.789062500000000E-003&lt;BR /&gt;
	i =&amp;nbsp; 6&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp; 12.5000000000000&lt;BR /&gt;
	i =&amp;nbsp; 7&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp; 18432.0000000000&lt;BR /&gt;
	i =&amp;nbsp; 8&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp; 25165824.0000000&lt;BR /&gt;
	i =&amp;nbsp; 9&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp; 42949672960.0000&lt;BR /&gt;
	i = 10&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp; 63771674411008.0&lt;BR /&gt;
	&lt;BR /&gt;
	&amp;nbsp;diff for direct multiplication&lt;BR /&gt;
	i =&amp;nbsp; 2&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;0.000000000000000E+000&lt;BR /&gt;
	i =&amp;nbsp; 3&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;0.000000000000000E+000&lt;BR /&gt;
	i =&amp;nbsp; 4&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;0.000000000000000E+000&lt;BR /&gt;
	i =&amp;nbsp; 5&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;0.000000000000000E+000&lt;BR /&gt;
	i =&amp;nbsp; 6&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;0.000000000000000E+000&lt;BR /&gt;
	i =&amp;nbsp; 7&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;0.000000000000000E+000&lt;BR /&gt;
	i =&amp;nbsp; 8&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;0.000000000000000E+000&lt;BR /&gt;
	i =&amp;nbsp; 9&lt;BR /&gt;
	&amp;nbsp;diff =&amp;nbsp; &amp;nbsp;0.000000000000000E+000&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Jan 2015 21:11:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/inconsistency-between-mkl-and-matmul/m-p/1060419#M21670</guid>
      <dc:creator>luis_gc_rego</dc:creator>
      <dc:date>2015-01-27T21:11:31Z</dc:date>
    </item>
    <item>
      <title>This is because round-off</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/inconsistency-between-mkl-and-matmul/m-p/1060420#M21671</link>
      <description>&lt;P&gt;This is because round-off errors in floating point computation accumulate differently in different implementations. It's not uncommon to see different results of the same algorithm when using different math libraries. Even with the same library, running on different hardware, or the same hardware but different number of threads, can lead to different results because the algorithm may be parallelized in different ways, hence the completion order of the operation sequences may be different.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Jan 2015 22:43:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/inconsistency-between-mkl-and-matmul/m-p/1060420#M21671</guid>
      <dc:creator>Zhang_Z_Intel</dc:creator>
      <dc:date>2015-01-27T22:43:38Z</dc:date>
    </item>
    <item>
      <title>Estimating or evaluating the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/inconsistency-between-mkl-and-matmul/m-p/1060421#M21672</link>
      <description>&lt;P&gt;Estimating or evaluating the dominant eigenvalue of your matrix A may provide some insight into why the difference between the vector iterates from alternative computations diverge so rapidly. If you decide to undertake this calculation, however, I should discourage you from working with a random matrix, as such a matrix would have no similarity to the matrix in your real application.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Jan 2015 23:57:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/inconsistency-between-mkl-and-matmul/m-p/1060421#M21672</guid>
      <dc:creator>mecej4</dc:creator>
      <dc:date>2015-01-27T23:57:16Z</dc:date>
    </item>
  </channel>
</rss>

