GETRF Without Pivoting is not faster than Partal Pivoting version.

n_aron · ‎09-21-2021

I am writing a code that is using "dgetrf" and "mkl_dgetrfnpi". As I am seeing with diferent number of thread the perfromance of No Pivoting version is same as Partal Pivoting version.

mkl_dgetrfnpi(&m, &n, &m, A, &lda, &info);

dgetrf(&m, &n, A, &lda, ipiv, &info);

You are saing me that the input matrix is important, but I am not seeing this. Could you please give an example about matrix type that No Pivoting is faster? I think pivoting will checks all elements of column. So time of finding max is equal for any kind of matrixes.

Best regards,

Aran

VidyalathaB_Intel · ‎09-21-2021

Hi,

Thanks for reaching out to us.

We are working on your issue. we will get back to you soon.

Meanwhile, as mentioned in the previous post (https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-GETRF-Partial-Pivoting-and-GETRF-without-Pivoting-have-a/m-p/1308138#M31901) , the links which you have provided for the source code are throwing errors when we tried to open them.

>>I am writing a code that is using "dgetrf" and "mkl_dgetrfnpi"

So could you please provide us a sample reproducer with the above functions? (along with OS and MKL version) so that we can work on it from our end.

Regards,

Vidya.

n_aron · ‎09-22-2021

Hi,

Thanks. Here is a link to my code:

https://drive.google.com/drive/folders/10QtwKBlAPY18QPno6svGgWlYUKhxzmJA?usp=sharing

Best regards,
Aran

Gennady_F_Intel · ‎09-27-2021

LAPACKE_mkl_dgetrfnpi will make sense to call in the case when the input matrix is a diagonal dominant matrix.

I made some experiments to compare the execution time between LAPACKE_dgetrf and LAPACKE_mkl_dgetrfnpi routines. The input matrixes were artificially generated diagonal dominant matrixes.

MKL_VERBOSE oneMKL 2021.0 Update 3 Product build 20210617 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 intel_thread

./a.out <input_matrix_size>

$ ./a.out 200

...GETRF , Texec == 0.000489, sec

...GETRFNPI , Texec == 0.000284, sec

$ ./a.out 400

...GETRF , Texec == 0.001419, sec

...GETRFNPI , Texec == 0.000842, sec

$ ./a.out 800

...GETRF , Texec == 0.004546, sec

...GETRFNPI , Texec == 0.003088, sec

$ ./a.out 1600

...GETRF , Texec == 0.019317, sec

...GETRFNPI , Texec == 0.014147, sec

$ ./a.out 3200

...GETRF , Texec == 0.067929, sec

...GETRFNPI , Texec == 0.055947, sec

$ ./a.out 6400

...GETRF , Texec == 0.361859, sec

...GETRFNPI , Texec == 0.325563, sec

$ ./a.out 12800

...GETRF , Texec == 2.394896, sec

...GETRFNPI , Texec == 2.148023, sec

CPU/OS specifics:

CPU: 2x Xeon Gold 5120 2.2Ghz 14c (NP=56) ( Skylake-SP)

MEMORY: 192GB 2400Mhz DDR4 Dual-rank

OS: CentOS Linux release 7.7.1908 (Core)

KERNEL: Linux 3.10.0-1062.4.1.el7.x86_64 x86_64

n_aron · ‎12-18-2021

HI,

Could you please share your code? Because I am seeing some different results for small dimentions with Xeon(R) Gold 6126.

Regards,
A.N

Gennady_F_Intel · ‎10-14-2021

This query has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.