Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

use of MKL subroutine

chubb87
Beginner
1,063 Views
Hi, I tried using subroutine mkl_sdiasv(), and it seems I have access to the MKL library, because the error message is:
"MKL Error: Parameter 4 was incorrect on entry to mkl_sdiasv"

the source code is:

program Mat_Test
implicit none
!include 'C:\\Program Files\\Intel\\MKL\\10.2.5.035\\include\\mkl_spblas.fi'
CHARACTER*1 :: transa = 'n'
CHARACTER :: matdescra(6)
INTEGER :: m = 5, lval=5, ndiag = 2
INTEGER :: idiag(2)=(/-3,0/)
REAL :: alpha=1
REAL :: val(5,2), x(5), y(5)
matdescra(1) = 'G';
matdescra(2) = 'L';
matdescra(3) = 'N';
matdescra(4) = 'F';
x = (/1,3,3,4,-6/);
val= reshape( (/0,0,0,2,-1, 1,3,6,2,-5/), (/lval, ndiag/) )

call mkl_sdiasv(transa, m, alpha, matdescra, val, lval, idiag, ndiag, x, y)

print *, y, 'END'

end program Mat_Test

what is the problem with matdescra?

0 Kudos
18 Replies
mecej4
Honored Contributor III
1,063 Views
The values 'G','L','N','F' form an appropriate combination for 'matdescra' for multiplication routines. You are, on the other hand, calling a solver routine, for which 'T','L','U','F' is one acceptable combination.

Please consult the MKL documentation to ascertain proper argument values.
0 Kudos
Gennady_F_Intel
Moderator
1,063 Views
also, please look at the example program for using MKL Sparse BLAS Level 2 and 3for matrices represented in the diagonal storage scheme.
You can find it in ..\examples\spblas\source\sdia.f file.
--Gennady
0 Kudos
chubb87
Beginner
1,063 Views
okay, thank you.

I thought the subroutine mkl_?diasv would solve every matrix equation with diagonal storage, but it seems A also has to have only nonzero elements on the main diagonal.
The terms diagonal storage and diagonal matrix are not used clearly in my opinion.

The example is now working for me, but I still have to understand what it is doing and if I need every step of the example.
0 Kudos
chubb87
Beginner
1,063 Views
I want to solve the system Ax = B, where A is sparse and easy to store in the diagonal storage scheme (but it is not a diagonal or triangular matrix).

I am using the A = L + D + U procedure.
I will implement a Jakobi or Gau Seidel method, which uses this decomposition. But the only thing that is missing is how to create the inverse of (D+L) in diagonal storage format.
Do you know if there is a subroutine in MKL which can do that?
0 Kudos
mecej4
Honored Contributor III
1,063 Views
"Diagonal Matrix" was defined in mathematics many decades before people started implementing "Diagonal Storage Scheme" on computers. Few of us are allowed the privilege of creating perfectly logical and consistent definitions on the first day, seeing that it is good at nightfall and, on the second day, inventing an algorithm to solve ...

See the Golub and VanLoan book.
0 Kudos
chubb87
Beginner
1,063 Views
I have found out that L+D+U can be useful as well, see my edited post above, for iterative instead of direct methods.
For the Jakobi algorithm that works well with the subroutines abailable, but for the Gau Seidel method I need to compute Inv(D+L) ...
0 Kudos
Victor_Gladkikh
New Contributor I
1,063 Views
You can compute vector y = inv(D+L) * x, where matrix A storage indiagonal format andA =(U + D + L).
Routine mkl_ddiasv with matdescra = "TLN"computes this operation (y = inv(D + L)* x).In many cases it is enought, because wecould use result of calculationinv(triangular matrix) * x.Ifyou need inv(D + L) as matrix I could recommendtry to find applicable LAPACK routine.
0 Kudos
chubb87
Beginner
1,063 Views
Thx for the advice, but I think I have found a better solution. I have converted my matrices all in triangular form with help of permutations.

So I am using mkl_sdiatrsv with lower triangular matrices.

It works very well, but I get the message: "stack overflow" when using vectors x and y with length of about 4000 elements.

Is this the normal limitation? What does it depend on and is it possible to increase it (hardware?) ?
0 Kudos
Gennady_F_Intel
Moderator
1,063 Views
There is no such restriction.
I think that this software bug.How do you allocate your working arrays?
Can you give the example for reproducing the problem?
--Gennady
0 Kudos
chubb87
Beginner
1,063 Views
Okay, I am still debugging, but I will post part of my code so that you can invisage what I am doing and where the problem might be.

I don't show all declarations, e.g. for val, I don't think that is important right now

m loops from 1 to 2 (or higher for different parameters)
SWF=1, NWF=2, SEF=3, ... up to 8 are used for indexing

CHARACTER*1 :: transa = 'n', uplo = 'l' ,diag = 'n'
CHARACTER :: matdescra(4) = 'TLUF'
integer :: ndiag, lval
data ndiag /4/
idiag = (/-Imu*Jmu,-Jmu,-1,0/);
do m = 1,ind_Dw
QVec(:,SWF,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwW_ext(:,SWF) + Dw(m,y)*Ayi/gy*IwS_ext(:,SWF) + Dw(m,z)*Azi/gz*IwF_ext(:,SWF); ! for South and West boundary conditions, i.e. starting in the north east
QVec(:,NWF,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwW_ext(:,NWF) + Dw(m,y)*Ayi/gy*IwN_ext(:,NWF) + Dw(m,z)*Azi/gz*IwF_ext(:,NWF);
QVec(:,SEF,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwE_ext(:,SEF) + Dw(m,y)*Ayi/gy*IwS_ext(:,SEF) + Dw(m,z)*Azi/gz*IwF_ext(:,SEF);
QVec(:,NEF,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwE_ext(:,NEF) + Dw(m,y)*Ayi/gy*IwN_ext(:,NEF) + Dw(m,z)*Azi/gz*IwF_ext(:,NEF);
QVec(:,SWB,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwW_ext(:,SWB) + Dw(m,y)*Ayi/gy*IwS_ext(:,SWB) + Dw(m,z)*Azi/gz*IwB_ext(:,SWB);
QVec(:,NWB,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwE_ext(:,NWB) + Dw(m,y)*Ayi/gy*IwS_ext(:,NWB) + Dw(m,z)*Azi/gz*IwB_ext(:,NWB);
QVec(:,SEB,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwW_ext(:,SEB) + Dw(m,y)*Ayi/gy*IwN_ext(:,SEB) + Dw(m,z)*Azi/gz*IwB_ext(:,SEB);
QVec(:,NEB,m) = beta_m(m)*V*Sp_m(:,m)*d_Ohm(m) + Dw(m,x)*Axi/gx*IwE_ext(:,NEB) + Dw(m,y)*Ayi/gy*IwN_ext(:,NEB) + Dw(m,z)*Azi/gz*IwB_ext(:,NEB);
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,SWF,m),Ip_tr(:,SWF,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,NWF,m),Ip_tr(:,NWF,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,SEF,m),Ip_tr(:,SEF,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,NEF,m),Ip_tr(:,NEF,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,SWB,m),Ip_tr(:,SWB,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,NWB,m),Ip_tr(:,NWB,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,SEB,m),Ip_tr(:,SEB,m))
call mkl_sdiatrsv(uplo,transa,diag,lval,val(:,:,m),lval,idiag,ndiag,QVec(:,NEB,m),Ip_tr(:,NEB,m))
end do
0 Kudos
mecej4
Honored Contributor III
1,063 Views
The first thing to do is to locate the stack overflow, by compiling with -g -traceback and running. Once the location of stack overflow has been found, you can think about whether to allocate the variables differently or to raise the stack limits.
0 Kudos
chubb87
Beginner
1,063 Views
I have found out in which line it occurs:

do m = 1,ind_Dw
call compute_all(Iw(:,:,(/W,N,F/)),Sp_m(:,:,:,m),beta_m(m),0,0,0,Dw(m,:),d_Ohm(m),Ip(:,:,:,1,m),IWE(:,:,1,m,:));
end do


Sorry for my lack of knowledge:

How can a different allocation be beneficial?
and how can I raise the stack limits?
0 Kudos
Victor_Gladkikh
New Contributor I
1,063 Views
I could recommend to use mkl_sdiasm instead ofusing severalmkl_sdiatrsv routines.Generally speaking mkl_sdiasm could give performance better thanset ofmkl_sdiatrsv runs.

Also it will be helpful if you provide us with declaration of arrays val, QVec, Ip_tr.


0 Kudos
chubb87
Beginner
1,063 Views

I tried using sdiasm, but somehow the solution matrix is zero. No error is given.
I wonder what I am doing wrong:

[bash]allocate(val(lval,ndiag,ind_Dw))
allocate(QVec(lval,8,ind_Dw))
allocate(Ip_tr(lval,8,ind_Dw))
val(1:Imu*Jmu,1,m) = 0; val(Imu*Jmu+1:Imu*Jmu*Kmu,1,m) = A_zi(:,m);
val(1:Jmu,2,m) = 0; val(Jmu+1:Imu*Jmu*Kmu,2,m) = A_xi(:,m);
val(1,3,m) = 0; val(2:Imu*Jmu*Kmu,3,m) = A_yi(:,m);        
val(:,4,m) = Ap(:,m);

call mkl_sdiasm(transa,lval,8,1,'TLNF',val(:,:,m),lval,idiag,ndiag,QVec(:,:,m),lval,Ip_tr(:,:,m),lval)
[/bash]

0 Kudos
TimP
Honored Contributor III
1,063 Views
Quoting chubb87
call mkl_sdiasm(transa,lval,8,1,'TLNF',val(:,:,m),lval,idiag,ndiag,QVec(:,:,m),lval,Ip_tr(:,:,m),lval)

Presumably, using an array section as an argument creates a temporary. If using ifort, you could set /assume:arg_temp_created in order to find out if this is so. Then, even if the subroutine writes back to the temporary which was initialized to Ip_tr(:,:,m), your program won't have access to those results. Ip_tr(1,1,m) might be what you meant.
0 Kudos
chubb87
Beginner
1,063 Views
No, it does not work with Ip_tr(1,1,m), I have to pass a whole matrix and not only a single value. Ip_tr(:,:) is my solution matrix.
with sdiatrsv it worked using (......,Ip_tr(:,1,1))
0 Kudos
Gennady_F_Intel
Moderator
1,063 Views
Martin,
I don't see anythingcriminal in this code. Can you give us the example of the code?
You can use private thread to send this example.
--Gennady
0 Kudos
chubb87
Beginner
1,063 Views
I have found the problem:

when I change alpha from 1 to 1.0, then it works, because then it is a real instead of an integer.

Note that I do not use a variable called alpha, but put in the number 1.0 at once:

[bash]call mkl_sdiasm(transa,lval,8,1.0,matdescra,val(:,:,m),lval,idiag,ndiag,QVec(:,:,m),lval,Ip_tr(:,:,m),lval)
[/bash]

I already avoided stack overflow by the optimization option "heap arrays = 0".

Now I will try to improve the efficiency (profiling).


0 Kudos
Reply