- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
The compile option of 2015:
/work1/soft/intel2015/composer_xe_2015.0.090/bin/intel64/ifort -i8 -openmp main.f90 -o i64_2015 -L /work1/soft/intel2015/composer_xe_2015.0.090/mkl/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -Wl,-rpath /work1/soft/intel2015/mkl/lib/intel64
The option of 2013
/work1/soft/intel/composer_xe_2013.0.079/bin/intel64/ifort -i8 -openmp main.f90 -o i64_2013 -L /work1/soft/intel/composer_xe_2013.0.079/mkl/lib/intel64/ -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -Wl,-rpath /work1/soft/intel/composer_xe_2013.0.079/mkl/lib/intel64
The 2013 gives 2.0000, but 2015 gives 0.000
The code is shown below
program testi8 implicit none integer,parameter :: REL = 4 integer, parameter :: num = 2500000000, one = 1 real(REL),dimension(:,:),allocatable :: a, b, c integer :: i, j print*, "allocate arrays..." allocate(a(1, num ), b(num, 1), c(1,1)) a = 0.0_REL b = 0.0_REL c = 0.0_REL print*, "value arrays..." do i = 1, num if(i.eq.1) then a(1,1) = 1.0_REL else if(mod(i, 2).eq.0) then a(1, i) = 1.0_REL else a(1, i) = -1.0_REL end if b(i, 1) = 1.0_REL end do call sgemm('n', 'n', one, one, num, 1.0_REL, a, one, b, num, 0.0_REL, c, one) print*, "result" print*, c end program testi8
Enlace copiado
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
thanks, we will check
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I see the same result too.
We will try to investigate the cause of the issue and will back with the updates.
thanks, Gennady
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
In this case,If you set the a array to 1.0. Then call the sgemm. The desired result should be 2.5e9.
but both 2013 and 2015 give the wrong result. The 2013 say it's 1.6777e7. The 2015 say it's 1.6e9.
I think this is another situation. Please check it.
Gennady Fedorov (Intel) wrote:
I see the same result too.
We will try to investigate the cause of the issue and will back with the updates.
thanks, Gennady
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
These differences seem to be due to the round-off errors in floating point operations. Vectorization changes the order of operations, and instead of accessing the elements sequentially, vectorization may process the elements in strides. For example, in your use case (essentially a dot product) instead of summing the element-wise vector-multiplication sequentially, a vector register is used to accumulate the sums in different chunks. The elements of the vector register is accumulated to get the final result. This causes round off errors even for the cases where one would expect the summation will cancel each other such as (1, 1, -1, -1, 1, -1, 1, ...).
To verify this claim, I created a C code that demonstrates this issue when vectorization is turned on (-O2 and above):
#include <stdlib.h> float sum(float* x, long long len) { long long i; float res = 0.; #pragma vector aligned for (i=0; i<len; ++i) res += x; return res; } void main(int argc, char* argv[]) { long long i; long long N = 100; if (argc>1) N = 1000LL*atoi(argv[1]); float* a = _mm_malloc(sizeof(float)*N, 128); printf("N=%ld \n", N); if (N<2) return; a[0] = 1; a[1] = 1; for (i=2; i<N; ++i) { if (i%2) a = 1; else a = -1; } printf("Result = %f\n", sum(a, N)); _mm_free(a); }
icc main.c -O2 -xAVX -vec-report=6
$ ./a.out 20000
N=20000000
Result = 2.000000
$ ./a.out 200000
N=200000000
Result = 0.000000
If you compile the same program with -O1, we always get 2.0:
./a.out 2000000
N=2000000000
Result = 2.000000

- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Página de impresión sencilla