Community
cancel
Showing results for 
Search instead for 
Did you mean: 
志强_赵_
Beginner
57 Views

bug of sgemm, 2015 mkl gives wrong result, but 2013 gives correct

The compile option of 2015:

/work1/soft/intel2015/composer_xe_2015.0.090/bin/intel64/ifort -i8 -openmp main.f90 -o i64_2015 -L /work1/soft/intel2015/composer_xe_2015.0.090/mkl/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -Wl,-rpath /work1/soft/intel2015/mkl/lib/intel64

The option of 2013

/work1/soft/intel/composer_xe_2013.0.079/bin/intel64/ifort -i8 -openmp main.f90 -o i64_2013 -L /work1/soft/intel/composer_xe_2013.0.079/mkl/lib/intel64/ -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -Wl,-rpath /work1/soft/intel/composer_xe_2013.0.079/mkl/lib/intel64

 

The 2013 gives 2.0000, but 2015 gives 0.000

The code is shown below

program testi8
  implicit none
  integer,parameter :: REL = 4
  integer,  parameter :: num  = 2500000000, one = 1
  real(REL),dimension(:,:),allocatable :: a, b, c
  integer :: i, j

  print*, "allocate arrays..."
  allocate(a(1, num ), b(num, 1), c(1,1))
  a = 0.0_REL
  b = 0.0_REL
  c = 0.0_REL

  print*, "value arrays..."
  do i = 1, num
     if(i.eq.1) then
        a(1,1) = 1.0_REL
     else if(mod(i, 2).eq.0) then
        a(1, i) = 1.0_REL
     else
        a(1, i) = -1.0_REL
     end if
     b(i, 1) = 1.0_REL
  end do

  call sgemm('n', 'n', one, one, num, 1.0_REL, a, one, b, num, 0.0_REL, c, one)

  print*, "result"
  print*, c

end program testi8

 

 

0 Kudos
4 Replies
Gennady_F_Intel
Moderator
57 Views

thanks, we will check

Gennady_F_Intel
Moderator
57 Views

I see the same result too.

We will try to investigate the cause of the issue and will back with the updates.

thanks, Gennady

志强_赵_
Beginner
57 Views

In this case,If you set the a array to 1.0. Then call the sgemm. The desired result should be 2.5e9.

but both 2013 and 2015 give the wrong result. The 2013 say it's 1.6777e7. The 2015 say it's 1.6e9.

I think this is another situation. Please check it.

Gennady Fedorov (Intel) wrote:

I see the same result too.

We will try to investigate the cause of the issue and will back with the updates.

thanks, Gennady

Murat_G_Intel
Employee
57 Views

These differences seem to be due to the round-off errors in floating point operations. Vectorization changes the order of operations, and instead of accessing the elements sequentially, vectorization may process the elements in strides. For example, in your use case (essentially a dot product) instead of summing the element-wise vector-multiplication sequentially, a vector register is used to accumulate the sums in different chunks. The elements of the vector register is accumulated to get the final result. This causes round off errors even for the cases where one would expect the summation will cancel each other such as (1, 1, -1, -1, 1, -1, 1, ...).

To verify this claim, I created a C code that demonstrates this issue when vectorization is turned on (-O2 and above):

 

#include <stdlib.h>
float sum(float* x, long long len) {
    long long i;
    float res = 0.;
#pragma vector aligned
    for (i=0; i<len; ++i)
        res += x;
    return res;
}

void main(int argc, char* argv[])
{
    long long i;
    long long N = 100;
    if (argc>1) N = 1000LL*atoi(argv[1]);
    float* a =  _mm_malloc(sizeof(float)*N, 128);
    printf("N=%ld \n", N);
    if (N<2) return;

    a[0] = 1;
    a[1] = 1;
    for (i=2; i<N; ++i) {
        if (i%2) a = 1;
        else a = -1;
    }
    printf("Result = %f\n", sum(a, N));
    _mm_free(a);
}


 

icc main.c -O2 -xAVX -vec-report=6

$ ./a.out  20000
N=20000000
Result = 2.000000

$ ./a.out  200000
N=200000000
Result = 0.000000
 

If you compile the same program with -O1, we always get 2.0:

./a.out  2000000
N=2000000000
Result = 2.000000
 

 

Reply