Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6616 Discussions

GETRF Without Pivoting is not faster than Partal Pivoting version.

n_aron
Beginner
467 Views

I am writing a code that is using "dgetrf" and "mkl_dgetrfnpi". As I am seeing with diferent number of thread the perfromance of No Pivoting version is same as Partal Pivoting version.

 

mkl_dgetrfnpi(&m, &n, &m, A, &lda, &info);

 

dgetrf(&m, &n, A, &lda, ipiv, &info);

 

image.png

You are saing me that the input matrix is important, but I am not seeing this. Could you please give an example about matrix type that No Pivoting is faster? I think pivoting will checks all elements of column. So time of finding max is equal for any kind of matrixes.

 

Best regards,

Aran

0 Kudos
5 Replies
VidyalathaB_Intel
Moderator
437 Views

Hi,


Thanks for reaching out to us.


We are working on your issue. we will get back to you soon.


Meanwhile, as mentioned in the previous post (https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-GETRF-Partial-Pivoting-and-GETRF...) , the links which you have provided for the source code are throwing errors when we tried to open them.


>>I am writing a code that is using "dgetrf" and "mkl_dgetrfnpi"


So could you please provide us a sample reproducer with the above functions? (along with OS and MKL version) so that we can work on it from our end.


Regards,

Vidya.


n_aron
Beginner
425 Views
Gennady_F_Intel
Moderator
389 Views

LAPACKE_mkl_dgetrfnpi will make sense to call in the case when the input matrix is a diagonal dominant matrix.

I made some experiments to compare the execution time between LAPACKE_dgetrf and LAPACKE_mkl_dgetrfnpi routines. The input matrixes were artificially generated diagonal dominant matrixes.


MKL_VERBOSE oneMKL 2021.0 Update 3 Product build 20210617 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 intel_thread

./a.out <input_matrix_size>


$ ./a.out 200

 ...GETRF ,    Texec  == 0.000489, sec

 ...GETRFNPI , Texec == 0.000284, sec

$ ./a.out 400

 ...GETRF ,    Texec  == 0.001419, sec

 ...GETRFNPI , Texec == 0.000842, sec

$ ./a.out 800

 ...GETRF ,    Texec  == 0.004546, sec

 ...GETRFNPI , Texec == 0.003088, sec

$ ./a.out 1600

 ...GETRF ,    Texec  == 0.019317, sec

 ...GETRFNPI , Texec == 0.014147, sec

$ ./a.out 3200

 ...GETRF ,    Texec  == 0.067929, sec

 ...GETRFNPI , Texec == 0.055947, sec

$ ./a.out 6400

 ...GETRF ,    Texec  == 0.361859, sec

 ...GETRFNPI , Texec == 0.325563, sec

$ ./a.out 12800

 ...GETRF ,    Texec  == 2.394896, sec

 ...GETRFNPI , Texec == 2.148023, sec

 

 

CPU/OS specifics:


 CPU:      2x Xeon Gold 5120 2.2Ghz 14c (NP=56) ( Skylake-SP)

 MEMORY:   192GB 2400Mhz DDR4 Dual-rank

 OS:       CentOS Linux release 7.7.1908 (Core)

 KERNEL:   Linux 3.10.0-1062.4.1.el7.x86_64 x86_64



n_aron
Beginner
220 Views

HI,

Could you please share your code? Because I am seeing some different results for small dimentions with Xeon(R) Gold 6126.

 

Regards,
A.N

Gennady_F_Intel
Moderator
335 Views

This query has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only. 



Reply