- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The compile option of 2015:
/work1/soft/intel2015/composer_xe_2015.0.090/bin/intel64/ifort -i8 -openmp main.f90 -o i64_2015 -L /work1/soft/intel2015/composer_xe_2015.0.090/mkl/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -Wl,-rpath /work1/soft/intel2015/mkl/lib/intel64
The option of 2013
/work1/soft/intel/composer_xe_2013.0.079/bin/intel64/ifort -i8 -openmp main.f90 -o i64_2013 -L /work1/soft/intel/composer_xe_2013.0.079/mkl/lib/intel64/ -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -Wl,-rpath /work1/soft/intel/composer_xe_2013.0.079/mkl/lib/intel64
The 2013 gives 2.0000, but 2015 gives 0.000
The code is shown below
program testi8 implicit none integer,parameter :: REL = 4 integer, parameter :: num = 2500000000, one = 1 real(REL),dimension(:,:),allocatable :: a, b, c integer :: i, j print*, "allocate arrays..." allocate(a(1, num ), b(num, 1), c(1,1)) a = 0.0_REL b = 0.0_REL c = 0.0_REL print*, "value arrays..." do i = 1, num if(i.eq.1) then a(1,1) = 1.0_REL else if(mod(i, 2).eq.0) then a(1, i) = 1.0_REL else a(1, i) = -1.0_REL end if b(i, 1) = 1.0_REL end do call sgemm('n', 'n', one, one, num, 1.0_REL, a, one, b, num, 0.0_REL, c, one) print*, "result" print*, c end program testi8
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks, we will check
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see the same result too.
We will try to investigate the cause of the issue and will back with the updates.
thanks, Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In this case,If you set the a array to 1.0. Then call the sgemm. The desired result should be 2.5e9.
but both 2013 and 2015 give the wrong result. The 2013 say it's 1.6777e7. The 2015 say it's 1.6e9.
I think this is another situation. Please check it.
Gennady Fedorov (Intel) wrote:
I see the same result too.
We will try to investigate the cause of the issue and will back with the updates.
thanks, Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These differences seem to be due to the round-off errors in floating point operations. Vectorization changes the order of operations, and instead of accessing the elements sequentially, vectorization may process the elements in strides. For example, in your use case (essentially a dot product) instead of summing the element-wise vector-multiplication sequentially, a vector register is used to accumulate the sums in different chunks. The elements of the vector register is accumulated to get the final result. This causes round off errors even for the cases where one would expect the summation will cancel each other such as (1, 1, -1, -1, 1, -1, 1, ...).
To verify this claim, I created a C code that demonstrates this issue when vectorization is turned on (-O2 and above):
#include <stdlib.h> float sum(float* x, long long len) { long long i; float res = 0.; #pragma vector aligned for (i=0; i<len; ++i) res += x; return res; } void main(int argc, char* argv[]) { long long i; long long N = 100; if (argc>1) N = 1000LL*atoi(argv[1]); float* a = _mm_malloc(sizeof(float)*N, 128); printf("N=%ld \n", N); if (N<2) return; a[0] = 1; a[1] = 1; for (i=2; i<N; ++i) { if (i%2) a = 1; else a = -1; } printf("Result = %f\n", sum(a, N)); _mm_free(a); }
icc main.c -O2 -xAVX -vec-report=6
$ ./a.out 20000
N=20000000
Result = 2.000000
$ ./a.out 200000
N=200000000
Result = 0.000000
If you compile the same program with -O1, we always get 2.0:
./a.out 2000000
N=2000000000
Result = 2.000000

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page