Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Why SYTRF/SYTRI is much slower than GETRF/GETRI to compute dense matrix inverse

Dan_Ghiocel
Beginner
274 Views

Dear MKL experts,

           My project needs performing the inverse of the complex symmetric dense matrix. I do it using three different pairs of subroutines, GETRF/GETRI, SYTRF/SYTRI, and SYTRF_ROOK/SYTRI_ROOK, in order to pick best one.It is supposed that the subroutines for symmetric matrix are faster than the subroutines for full matrix from the computational math theory, as it is documented in the INTEL programmer reference manual.

GETRF(...)
for real flavors,
If m = n, The approximate number of floating-point operation is (2/3)n3
The number of operations for complex flavors is four times greater, (8/3)n3.
GETRI(...)
The total number of floating-point operations is approximately (4/3)n3 for real flavors and (16/3)n3 for complex flavors.

 

SYTRF(...)
The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for complex flavors.
SYTRI(...)
The total number of floating-point operations is approximately (2/3)n3 for real flavors and (8/3)n3 for complex flavors.

SYTRF_ROOK(...)
  [No information of floating-point operations]
SYTRI_ROOK(...)
The total number of floating-point operations is approximately (2/3)n3 for real flavors and (8/3)n3 for complex flavors.    

 

   In reality, it is GETRF/GETRI are much faster for larger size dense matrix. My test results are summarized in the below.
   Matrix Size     GETRF          SYTRF          SYTRF_ROOK
   1000x1000       0.015          0.016           0.047
   2000x2000       0.142          0.141           0.281
   5000x5000       1.486          1.406           2.595
 10000x10000       9.907          9.283          16.282   
    
   Matrix Size     GETRI          SYTRI          SYTRI_ROOK
   1000x1000       0.064          0.563           0.595
   2000x2000       0.312          4.908           4.938
   5000x5000       4.437         74.600          74.490
 10000x10000      26.972        625.908         615.346

   We can learn GETRF/GETRI is 15 to 20 faster than those subroutines for symmetric ones.

   Why? Please give me some advice. My test code is attached as the following.

    Thanks.

0 Kudos
1 Reply
Dan_Ghiocel
Beginner
274 Views

Additional information:

       OS:  Windows 8.0 Home

       Compiler: Intel Parallel Studio 2018.3 + MKL 2018.3   with VS 2015 community edition

 

0 Kudos
Reply