topic Hi Daniel, in Intel® oneAPI Math Kernel Library

Scalability of dense symmetric indefinite factorization

Rehfeldt__Daniel — Sun, 25 Feb 2018 20:05:17 GMT

Hi,

I am trying to speed up the factorization of a dense symmetric indefinite matrix, the size of my matrices is usually between 10 k and 20 k.

I am using LAPACK (dsytrf) and MKL 2018 and I run it on a supercomputer node with two Intel Xeon E5-2680 v3 Haswell CPUs (2 x 12 Cores, 2,5 GHz). I also tried a node with Intel Xeon Phi 7250-F Knights Landing CPU and 68 cores, 1.4 GHz. The problem is that the factorization does not seem to scale very well with the number of threads I use: with up to 8 threads I see some improvement (the run time is halfed) but after that there is even a slowdown.

Is this something that is to be expected from this MKL routine? And if so, do you know of any alternative that scales better?

Thanks

Daniel

Hi Daniel,

Ying_H_Intel — Thu, 01 Mar 2018 01:26:18 GMT

Hi Daniel, Do you mean on both machine, the thread scale is limited to thread 8? it is not expected. we publish some factorization benchmark like dgetrf https://software.intel.com/en-us/mkl/features/benchmarks on xeon and xeon phi. for your reference. And if need, please submit the exact issue to https://supporttickets.intel.com/?lang=en-US with your reproduce matrix. Best Regards, Ying

Hi Ying,

Rehfeldt__Daniel — Thu, 01 Mar 2018 10:16:19 GMT

Hi Ying,

thanks for your help. On both machines the factorization does not scale beyond 8 threads.I will submit the matrix to the support, as you suggested.

Best

Daniel

Hi Daniel,

Denis_S_Intel — Fri, 02 Mar 2018 05:58:25 GMT

Hi Daniel,

We have been working on improving this functionality in terms of performance and scalability, the optimizations will be available in one of the next releases.

Hi Denis,

Rehfeldt__Daniel — Fri, 02 Mar 2018 09:07:52 GMT

Hi Denis,

thanks for the information. Do you have an idea on how long this will take (several months, a year, etc)? I am not familiar with your release cycles. Would you recommend to try an LU factorization until then? According to your benchmarks that seem to scale beyond 8 threads.

Hi Daniel,

Denis_S_Intel — Fri, 02 Mar 2018 23:43:43 GMT

Hi Daniel,

The new release is expected this month. As for LU factorization, yes I think it's a good way to try LU instead of LDLT until the new release is available.
May I ask you what are you going to do with the results once you have them?

Hi Denis,

Rehfeldt__Daniel — Sat, 03 Mar 2018 07:56:03 GMT

Hi Denis,

thanks again. Will your enhancements be in the release notes? Otherwise could you let me know once it has been released?

I will try the LU factorization then. Would you expect it to scale to 68 Cores (lets say for a 10 k matrix)?

I use the factorization for solving two to four linear systems with different right hand sides.

Hi Daniel,

Denis_S_Intel — Tue, 06 Mar 2018 20:44:38 GMT

Hi Daniel,

Yes, the enhancements will be in the release notes and yes, the LU factorization shows good scalability.