I am writing a code that is using "dgetrf" and "mkl_dgetrfnpi". As I am seeing with diferent number of thread the perfromance of No Pivoting version is same as Partal Pivoting version.
mkl_dgetrfnpi(&m, &n, &m, A, &lda, &info);
dgetrf(&m, &n, A, &lda, ipiv, &info);
You are saing me that the input matrix is important, but I am not seeing this. Could you please give an example about matrix type that No Pivoting is faster? I think pivoting will checks all elements of column. So time of finding max is equal for any kind of matrixes.
Thanks for reaching out to us.
We are working on your issue. we will get back to you soon.
Meanwhile, as mentioned in the previous post (https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-GETRF-Partial-Pivoting-and-GETRF...) , the links which you have provided for the source code are throwing errors when we tried to open them.
>>I am writing a code that is using "dgetrf" and "mkl_dgetrfnpi"
So could you please provide us a sample reproducer with the above functions? (along with OS and MKL version) so that we can work on it from our end.
Thanks. Here is a link to my code:
LAPACKE_mkl_dgetrfnpi will make sense to call in the case when the input matrix is a diagonal dominant matrix.
I made some experiments to compare the execution time between LAPACKE_dgetrf and LAPACKE_mkl_dgetrfnpi routines. The input matrixes were artificially generated diagonal dominant matrixes.
MKL_VERBOSE oneMKL 2021.0 Update 3 Product build 20210617 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 intel_thread
$ ./a.out 200
...GETRF , Texec == 0.000489, sec
...GETRFNPI , Texec == 0.000284, sec
$ ./a.out 400
...GETRF , Texec == 0.001419, sec
...GETRFNPI , Texec == 0.000842, sec
$ ./a.out 800
...GETRF , Texec == 0.004546, sec
...GETRFNPI , Texec == 0.003088, sec
$ ./a.out 1600
...GETRF , Texec == 0.019317, sec
...GETRFNPI , Texec == 0.014147, sec
$ ./a.out 3200
...GETRF , Texec == 0.067929, sec
...GETRFNPI , Texec == 0.055947, sec
$ ./a.out 6400
...GETRF , Texec == 0.361859, sec
...GETRFNPI , Texec == 0.325563, sec
$ ./a.out 12800
...GETRF , Texec == 2.394896, sec
...GETRFNPI , Texec == 2.148023, sec
CPU: 2x Xeon Gold 5120 2.2Ghz 14c (NP=56) ( Skylake-SP)
MEMORY: 192GB 2400Mhz DDR4 Dual-rank
OS: CentOS Linux release 7.7.1908 (Core)
KERNEL: Linux 3.10.0-1062.4.1.el7.x86_64 x86_64
This query has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.