Problems with Matrix Inversion dgetri

C_J_X_ · ‎03-14-2013

I use two functions dgetrf and dgetri to do large matrix inversion. The first function is doing LU decomposition and the second function is doing matrix inversion based on LU decomposition. When I comparing the result with matlab, I found some problems:

1. For the same big matrix (I tried 10000 by 10000), the speedup of mkl is not very large. For exmaple, matlab took about 236 seconds while mkl took about 220 seconds to finish. I want to know whether this speedup is ok? Or mkl can get much better performance?

2. I found two parameters in dgetri is very important for the performance: work and lwork. At first I set work = 8*N, N is the size of the matrix, then mkl will slower than matlab; Then I changed work = N*N, after that mkl could achieve the speedup I mentioned. Is there any other things I need to do to get a better performance?

3. In my code I also want to do dense matrix multiply sparse matrix. Is there any specific function in mkl I can use for this? Or I can just use cblas_dgemm to do matrix multiplication?

Thanks for the help.

C.J.

Zhang_Z_Intel · ‎03-14-2013

Observed performance depends on many factors. Do you use parallel MKL or sequential MKL? Which version of MKL are you using? What is your OS and CPU? How many threads did you use? Did you align your data and how? Did you call the FORTRAN interface or the CBLAS interface? Please provide these details and we can help.

Matlab on Intel architectures actually use MKL internally for many linear algebra functions. This may explain why you didn't see higher speedup.

As to multipy a dense matrix with a sparse matrix, you can try mkl_dcsrmm or mkl_dcscmm or mkl_dcoomm, depending on the storage format of your sparse matrix (CSR, CSC, or COO). Search the MKL reference manual (http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mklman/index.htm) using these function names as keywords to get detailed information.

C_J_X_ · ‎03-14-2013

Actually I am a beginner of mkl and thanks for your reply. I am using sequential MKL; 11.0 free trial version; I am using visual studio under windows. I am not sure about the CPU information right now. I use one thread and user CBLAS interface. These are the information about my case and is there any suggestion based on these?

Zhang Z (Intel) wrote:

Observed performance depends on many factors. Do you use parallel MKL or sequential MKL? Which version of MKL are you using? What is your OS and CPU? How many threads did you use? Did you align your data and how? Did you call the FORTRAN interface or the CBLAS interface? Please provide these details and we can help.

Matlab on Intel architectures actually use MKL internally for many linear algebra functions. This may explain why you didn't see higher speedup.

As to multipy a dense matrix with a sparse matrix, you can try mkl_dcsrmm or mkl_dcscmm or mkl_dcoomm, depending on the storage format of your sparse matrix (CSR, CSC, or COO). Search the MKL reference manual (http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/...) using these function names as keywords to get detailed information.

Zhang_Z_Intel · ‎03-15-2013

I'd suggest you link with parallel MKL, which by default use multiple threads. This gives you the full performance potential of MKL.

For CPU information, the easiest way would be looking at Control Panel -> System and Security -> System, and looking for "Processor".

Hope it helps.

C_J_X_ · ‎03-28-2013

Hi,

When I try to do the "Dense matrix multipy sparse matrix" operation, I find out that the three functions you mentioned are applicable for "sparse matrix multiply dense matrix". I want to know whether in MKL is there any function can do the "Dense matrix multipy sparse matrix" operation?

I also have another question about the computation complexity. What is the computation complexity for matrix inversion and matrix multiplication (cblas_dgemm) in MKL? Because when I use cblas_dgemm to do matrix multiplication operation, the results turns out that MKL is slower than matlab for the same operation.

Thanks.

Zhang Z (Intel) wrote:

Observed performance depends on many factors. Do you use parallel MKL or sequential MKL? Which version of MKL are you using? What is your OS and CPU? How many threads did you use? Did you align your data and how? Did you call the FORTRAN interface or the CBLAS interface? Please provide these details and we can help.

Matlab on Intel architectures actually use MKL internally for many linear algebra functions. This may explain why you didn't see higher speedup.

As to multipy a dense matrix with a sparse matrix, you can try mkl_dcsrmm or mkl_dcscmm or mkl_dcoomm, depending on the storage format of your sparse matrix (CSR, CSC, or COO). Search the MKL reference manual (http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/...) using these function names as keywords to get detailed information.

Zhang_Z_Intel · ‎03-29-2013

Please consider, AB=(B^TA^T)^T. So you just use the same routines, but transpose your input matrix and then transpose the result matrix.

Matrix multiplication and matrix inversion in MKL have asymptotic cost of O(n³).

There are many factors affect MKL DGEMM performance, for example, problem size, data alignment, etc. Please read this chapter in the MKL user guide for performance tips. Also, calling Fortran interface instead of CBLAS interface may also give you better performance. Read more here.