use of MKL subroutine

chubb87 · ‎05-27-2010

Hi, I tried using subroutine mkl_sdiasv(), and it seems I have access to the MKL library, because the error message is:
"MKL Error: Parameter 4 was incorrect on entry to mkl_sdiasv"

the source code is:

program Mat_Test
implicit none
!include 'C:\\Program Files\\Intel\\MKL\\10.2.5.035\\include\\mkl_spblas.fi'
CHARACTER*1 :: transa = 'n'
CHARACTER :: matdescra(6)
INTEGER :: m = 5, lval=5, ndiag = 2
INTEGER :: idiag(2)=(/-3,0/)
REAL :: alpha=1
REAL :: val(5,2), x(5), y(5)
matdescra(1) = 'G';
matdescra(2) = 'L';
matdescra(3) = 'N';
matdescra(4) = 'F';
x = (/1,3,3,4,-6/);
val= reshape( (/0,0,0,2,-1, 1,3,6,2,-5/), (/lval, ndiag/) )

call mkl_sdiasv(transa, m, alpha, matdescra, val, lval, idiag, ndiag, x, y)

print *, y, 'END'

end program Mat_Test

what is the problem with matdescra?

mecej4 · ‎05-27-2010

The values 'G','L','N','F' form an appropriate combination for 'matdescra' for multiplication routines. You are, on the other hand, calling a solver routine, for which 'T','L','U','F' is one acceptable combination.

Please consult the MKL documentation to ascertain proper argument values.

Gennady_F_Intel · ‎05-27-2010

also, please look at the example program for using MKL Sparse BLAS Level 2 and 3for matrices represented in the diagonal storage scheme.

You can find it in ..\examples\spblas\source\sdia.f file.

--Gennady

chubb87 · ‎05-28-2010

okay, thank you.

I thought the subroutine mkl_?diasv would solve every matrix equation with diagonal storage, but it seems A also has to have only nonzero elements on the main diagonal.
The terms diagonal storage and diagonal matrix are not used clearly in my opinion.

The example is now working for me, but I still have to understand what it is doing and if I need every step of the example.

chubb87 · ‎05-28-2010

I want to solve the system Ax = B, where A is sparse and easy to store in the diagonal storage scheme (but it is not a diagonal or triangular matrix).

I am using the A = L + D + U procedure.
I will implement a Jakobi or Gau Seidel method, which uses this decomposition. But the only thing that is missing is how to create the inverse of (D+L) in diagonal storage format.
Do you know if there is a subroutine in MKL which can do that?

mecej4 · ‎05-28-2010

"Diagonal Matrix" was defined in mathematics many decades before people started implementing "Diagonal Storage Scheme" on computers. Few of us are allowed the privilege of creating perfectly logical and consistent definitions on the first day, seeing that it is good at nightfall and, on the second day, inventing an algorithm to solve ...

See the Golub and VanLoan book.

chubb87 · ‎05-28-2010

I have found out that L+D+U can be useful as well, see my edited post above, for iterative instead of direct methods.
For the Jakobi algorithm that works well with the subroutines abailable, but for the Gau Seidel method I need to compute Inv(D+L) ...

Victor_Gladkikh · ‎05-31-2010

You can compute vector y = inv(D+L) * x, where matrix A storage indiagonal format andA =(U + D + L).
Routine mkl_ddiasv with matdescra = "TLN"computes this operation (y = inv(D + L)* x).In many cases it is enought, because wecould use result of calculationinv(triangular matrix) * x.Ifyou need inv(D + L) as matrix I could recommendtry to find applicable LAPACK routine.

chubb87 · ‎05-31-2010

Thx for the advice, but I think I have found a better solution. I have converted my matrices all in triangular form with help of permutations.

So I am using mkl_sdiatrsv with lower triangular matrices.

It works very well, but I get the message: "stack overflow" when using vectors x and y with length of about 4000 elements.

Is this the normal limitation? What does it depend on and is it possible to increase it (hardware?) ?

Gennady_F_Intel · ‎05-31-2010

There is no such restriction.
I think that this software bug.How do you allocate your working arrays?

Can you give the example for reproducing the problem?

--Gennady

chubb87 · ‎05-31-2010

Okay, I am still debugging, but I will post part of my code so that you can invisage what I am doing and where the problem might be.

I don't show all declarations, e.g. for val, I don't think that is important right now

m loops from 1 to 2 (or higher for different parameters)
SWF=1, NWF=2, SEF=3, ... up to 8 are used for indexing

CHARACTER*1 :: transa = 'n', uplo = 'l' ,diag = 'n'
CHARACTER :: matdescra(4) = 'TLUF'
integer :: ndiag, lval
data ndiag /4/
idiag = (/-Imu*Jmu,-Jmu,-1,0/);
do m = 1,ind_Dw
QVec(:,SWF,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwW_ext(:,SWF) + Dw(m,y)*Ayi/gy*IwS_ext(:,SWF) + Dw(m,z)*Azi/gz*IwF_ext(:,SWF); ! for South and West boundary conditions, i.e. starting in the north east
QVec(:,NWF,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwW_ext(:,NWF) + Dw(m,y)*Ayi/gy*IwN_ext(:,NWF) + Dw(m,z)*Azi/gz*IwF_ext(:,NWF);
QVec(:,SEF,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwE_ext(:,SEF) + Dw(m,y)*Ayi/gy*IwS_ext(:,SEF) + Dw(m,z)*Azi/gz*IwF_ext(:,SEF);
QVec(:,NEF,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwE_ext(:,NEF) + Dw(m,y)*Ayi/gy*IwN_ext(:,NEF) + Dw(m,z)*Azi/gz*IwF_ext(:,NEF);
QVec(:,SWB,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwW_ext(:,SWB) + Dw(m,y)*Ayi/gy*IwS_ext(:,SWB) + Dw(m,z)*Azi/gz*IwB_ext(:,SWB);
QVec(:,NWB,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwE_ext(:,NWB) + Dw(m,y)*Ayi/gy*IwS_ext(:,NWB) + Dw(m,z)*Azi/gz*IwB_ext(:,NWB);
QVec(:,SEB,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwW_ext(:,SEB) + Dw(m,y)*Ayi/gy*IwN_ext(:,SEB) + Dw(m,z)*Azi/gz*IwB_ext(:,SEB);
QVec(:,NEB,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwE_ext(:,NEB) + Dw(m,y)*Ayi/gy*IwN_ext(:,NEB) + Dw(m,z)*Azi/gz*IwB_ext(:,NEB);
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,SWF,m),Ip_tr(:,SWF,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,NWF,m),Ip_tr(:,NWF,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,SEF,m),Ip_tr(:,SEF,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,NEF,m),Ip_tr(:,NEF,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,SWB,m),Ip_tr(:,SWB,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,NWB,m),Ip_tr(:,NWB,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,SEB,m),Ip_tr(:,SEB,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,NEB,m),Ip_tr(:,NEB,m))
end do

mecej4 · ‎05-31-2010

The first thing to do is to locate the stack overflow, by compiling with -g -traceback and running. Once the location of stack overflow has been found, you can think about whether to allocate the variables differently or to raise the stack limits.

chubb87 · ‎06-01-2010

I have found out in which line it occurs:

do m = 1,ind_Dw
call compute_all(Iw(:,:,(/W,N,F/)),Sp_m(:,:,:,m),beta_m(m),0,0,0,Dw(m,:),d_Ohm(m),Ip(:,:,:,1,m),IWE(:,:,1,m,:));
end do

Sorry for my lack of knowledge:

How can a different allocation be beneficial?
and how can I raise the stack limits?

Victor_Gladkikh · ‎06-01-2010

I could recommend to use mkl_sdiasm instead ofusing severalmkl_sdiatrsv routines.Generally speaking mkl_sdiasm could give performance better thanset ofmkl_sdiatrsv runs.

Also it will be helpful if you provide us with declaration of arrays val, QVec, Ip_tr.

chubb87 · ‎06-01-2010

I tried using sdiasm, but somehow the solution matrix is zero. No error is given.
I wonder what I am doing wrong:

[bash]allocate(val(lval,ndiag,ind_Dw))
allocate(QVec(lval,8,ind_Dw))
allocate(Ip_tr(lval,8,ind_Dw))
val(1:Imu*Jmu,1,m) = 0; val(Imu*Jmu+1:Imu*Jmu*Kmu,1,m) = A_zi(:,m);
val(1:Jmu,2,m) = 0; val(Jmu+1:Imu*Jmu*Kmu,2,m) = A_xi(:,m);
val(1,3,m) = 0; val(2:Imu*Jmu*Kmu,3,m) = A_yi(:,m);        
val(:,4,m) = Ap(:,m);

call mkl_sdiasm(transa,lval,8,1,'TLNF',val(:,:,m),lval,idiag,ndiag,QVec(:,:,m),lval,Ip_tr(:,:,m),lval)
[/bash]

TimP · ‎06-01-2010

Quoting chubb87

call mkl_sdiasm(transa,lval,8,1,'TLNF',val(:,:,m),lval,idiag,ndiag,QVec(:,:,m),lval,Ip_tr(:,:,m),lval)

Presumably, using an array section as an argument creates a temporary. If using ifort, you could set /assume:arg_temp_created in order to find out if this is so. Then, even if the subroutine writes back to the temporary which was initialized to Ip_tr(:,:,m), your program won't have access to those results. Ip_tr(1,1,m) might be what you meant.

chubb87 · ‎06-01-2010

No, it does not work with Ip_tr(1,1,m), I have to pass a whole matrix and not only a single value. Ip_tr(:,:) is my solution matrix.
with sdiatrsv it worked using (......,Ip_tr(:,1,1))

Gennady_F_Intel · ‎06-01-2010

Martin,

I don't see anythingcriminal in this code. Can you give us the example of the code?

You can use private thread to send this example.

--Gennady

chubb87 · ‎06-03-2010

I have found the problem:

when I change alpha from 1 to 1.0, then it works, because then it is a real instead of an integer.

Note that I do not use a variable called alpha, but put in the number 1.0 at once:

[bash]call mkl_sdiasm(transa,lval,8,1.0,matdescra,val(:,:,m),lval,idiag,ndiag,QVec(:,:,m),lval,Ip_tr(:,:,m),lval)
[/bash]

I already avoided stack overflow by the optimization option "heap arrays = 0".

Now I will try to improve the efficiency (profiling).