It is true that our focus is on much larger matrices. We have made a very significant effort in MKL 7.0 (which should be available shortly) to focus on the small matrix case as well. In this case, we do tend to use different and simpler algorithms in the small matrix case. For the most part, we do tend to use hybrid algorithms which involve special cases for when things are too small. But usually we do try matrix multiply strategies in DGEMM before we try them on ZGEMM. I suspect that DGEMM in 7.0 will respond better to 6x6 matrices. We would certainly like to find the best solution for all cases.
The best algorithms for large matrices tend to have enormous overheads for small matrices. On a 6x6 matrix for example, the interface itself seems to consume half the time (that is, one could dosignificantly better than netlib BLAS simply by inlining 6x6x6 loops). Having a malloc and some of the other tricks we use certainly doesn't help.
I think your idea is a good one. I will bring it to the attention of the rest of the developers. Unfortunately, it is too late for 7.0, and possibly 7.0.1. I honestly do not know when we will address your specific concern, but I can assure you that we will continue to improve the small matrix cases. It is very much a topic of our attention, as you will see by comparing small DGEMMs between 6.1 and 7.0 when it is released.
Thank you again for your suggestion.
- Greg Henry