Importance of size of lwork array when passed to ?getri
Just a FYI for MKL users, as this caught me....
I was profiling and noticed that calls to zgetri was not scaling as the # of threads increased. Also Intel Amplifier showed that that OMP threads apart from the main thread were spinning.
The problem was an work array only set to size "n". Increasing lwork to the recommended value tripled the performance with 4 cores and showed the threads fully occupied again. Of course, the MKL documentation covers this, including a "pre-flight" to get an optimum value for lwork.