For a projection operation, I should implement
F = ( I - Mf*Vfr*Vfr^T ) * F
Mf is a sparse matrix of size sz_fXsz_f
Vfr is a vector of sz_fX1
F is a block of vectors of size sz_fXP, and stored as tfBlock2 in the code. tfBlock3 is also used to store intermediate results.
How I implemented this is to, in some steps, namely,
1.) compute Vfr^T*F by using dgemm and store the result in r_dummy2, direct code
transa = 't' transb = 'n' a = 1.0_dbl b = 0.0_dbl r_dummy2 = 0.0_dbl call dgemm( transa, transb, 1, P, & sz_f, a, vfr_r, & sz_f, tfBlock2, sz_f, b, r_dummy2, sz_f)
2.) from previous operations I stored the matrix vector multiplication, Mf*Vfr so I am re-using that here
transa = 'n' call dgemm( transa,transb, sz_f, P, 1,& a, Mfvfr_r,& sz_f, r_dummy2, P, b, tfBlock3, sz_f)
3.) do the block subtraction with a simple loop( I can also use blas 3 level routines with a identity matrix of PXP for this I guess.)
v_scalar = -1.0_dbl do k=1,P call daxpy(sz_f,v_scalar,tfBlock3(:,k),& incx,tfBlock2(:,k),incy) end do
However I ran into some problems, I checked the code and up to this point the results are more or less the same with MATLAB, for instance before performing this step, the norms of the columns of F(as given above) are almost the same, the relative error between MATLAB and MKL is on the order to 1e-13.
However, after this projection the norms start to differ. A sample output is given below for the norms of the columns of the resulting matrix block F,
MATLAB, for a block size of 3
Fortran code with MKL with the same block size
with the least error on the last vector and %1 error on the second vector norm. Can someone shed some light on the above points and provide me with pointers on what could be going wrong? Any help is appreciated greatly.
My compiler options are given as
FC = ifort -ipo -unroll -fp-model strict -fp-stack-check -check all -fpe0 \
-traceback -ftrapuv -O2 -prec-div -prec-sqrt -debug all -fp-speculation=off
Do you use the same computing sequence in MATLAB and MKL?
Can you attach the data that you feed to daxpy, namely the arrays tfBlock2 and tfBlock3, as well as the value of sz_f, P, incx, incy, etc.?