i am running on an intel 56xx platform with two cpu's each with 12 cores. i am using ifort 12.0 with mkl 10.3. in our application we must repeatedly invert large symmetric matrices of size 2000-10000.
1. if i use lapack's dsytrf followed by dsytri, i find that the timing for the dsytri step is independent of the number of threads i request using set OMP_NUM_THREADS n and set MKL_NUM_THREADS n.
2. if we use lapack's dgetrf followed by dgeti, we find that the timing for the dgetri step does decrease with an increase in the number of threads, but that for Nthreads > 5, the decrease is very slight. thus the problem does not seem to benefit from parallization beyond Nthreads =5
is this correct?
also, in the "what's new" notes for mkl 10.3, there is mention of new code for dsytri, specifically a module called dsytri2. however, when i look in the directory /opt/intel/composerxe-2011.0.085/mkl/lib and execute the command ar -t libmkl_lapack95.a, i don't find any module named dsytri2. can someone clarify this?
As for the dsytri2 symbol the problem may be that the Fortran 95 interfaces for LAPACK functions don't contain the initial letter specifying data type since that is determined implicitly. Did you try a search for sytri2?
Unfortunately we don't have mentioning of DSYTRI2 in our documentation. But you could find the mentioning of the function on "what's new" page for NETLIB LAPACK 3.3, which was released recently. Latest available releases of MKL for the momentcorresponds to NETLIB LAPACK 3.2.2 by the functionality.
You are right, current DSYTRI doesn't benefit from threading. And the DSYTRI2 from NETLIB LAPACK 3.3 is exactly addresses that issue. The LAPACK 3.3 release will be included in one of our future updates. For now you could link the DSYTRI2 function from NETLIB LAPACK before MKL in your linking line and benefit from highly optimized MKL BLAS.
Thank you for reporting the issue, there is indeed lack of scalability for the current algorithm. We have it in plans
for one of future MKL releases.