I am trying to speed up the factorization of a dense symmetric indefinite matrix, the size of my matrices is usually between 10 k and 20 k.
I am using LAPACK (dsytrf) and MKL 2018 and I run it on a supercomputer node with two Intel Xeon E5-2680 v3 Haswell CPUs (2 x 12 Cores, 2,5 GHz). I also tried a node with Intel Xeon Phi 7250-F Knights Landing CPU and 68 cores, 1.4 GHz. The problem is that the factorization does not seem to scale very well with the number of threads I use: with up to 8 threads I see some improvement (the run time is halfed) but after that there is even a slowdown.
Is this something that is to be expected from this MKL routine? And if so, do you know of any alternative that scales better?
thanks for your help. On both machines the factorization does not scale beyond 8 threads.I will submit the matrix to the support, as you suggested.
We have been working on improving this functionality in terms of performance and scalability, the optimizations will be available in one of the next releases.
thanks for the information. Do you have an idea on how long this will take (several months, a year, etc)? I am not familiar with your release cycles. Would you recommend to try an LU factorization until then? According to your benchmarks that seem to scale beyond 8 threads.
The new release is expected this month. As for LU factorization, yes I think it's a good way to try LU instead of LDLT until the new release is available.
May I ask you what are you going to do with the results once you have them?
thanks again. Will your enhancements be in the release notes? Otherwise could you let me know once it has been released?
I will try the LU factorization then. Would you expect it to scale to 68 Cores (lets say for a 10 k matrix)?
I use the factorization for solving two to four linear systems with different right hand sides.