In our case, the H matrix are sparse matrix with height 300 width 300.
our MKL call stack are:
1. phase 11 ( ~30ms)
2. phase 22 ( ~1.1ms)
3. phase 33 (~1ms)
I understand phase 11 only need to do once, but even our solve iteration are 10 loop, the total time will be 30+ (1.1+1)*10 = 51ms.
in this situation, phase 11 cost more than 60% time, any ways to avoid the overhead?
Unfortunately, not really (we have a couple of features which can help improving reordering for large systems but they won't help here). It is a known issue that reordering phase has some not-so-easily reduced overheads, which can be limiting performance for small matrices.
For 300x300 sparse matrices, I'd recommend to use dense linear algebra solvers (from LAPACK). Regularity of data access should more than compensate for using some zero entries. Unless you have some specific details in your application which make sparse solvers look more attractive.
thanks for your reply, Kirill.
actually our case is for SLAM bundle adjustment, which has Pose 50, Landmark 3000.
the matrix will be 50*6 + 3000*3 = 9300 rows and cols.
if use schur complement, the phase 11 need 100+ms to do, but solver phase only need like 20+ ms for one iteration.
timeMKL_phase11_symbolic= 161.812 timeMKL_phase22= 18.4635 timeMKL_phase331= 3.32485 timeMKL_schurSolver= 1.75007 timeMKL_phase333= 3.45389
timeMKL_phase11_symbolic= 0 timeMKL_phase22= 15.307 timeMKL_phase331= 3.02907 timeMKL_schurSolver= 0.69892 timeMKL_phase333= 3.40465
timeMKL_phase11_symbolic= 0 timeMKL_phase22= 15.3569 timeMKL_phase331= 2.9652 timeMKL_schurSolver= 0.699728 timeMKL_phase333= 3.44434
timeMKL_phase11_symbolic= 0 timeMKL_phase22= 16.5589 timeMKL_phase331= 3.07169 timeMKL_schurSolver= 0.684186 timeMKL_phase333= 3.42943