I have noted that zgemm will be taken by AO to the Xeon PHI, but not zgemm3m. This is a bit annoying as AO can give a significant boost to zgemm performance, but zgemm3m is faster than ZGEMM normally.
For example on my 8 core machine, ZGEMM3M takes 46s, ZGEMM 62s for a 8192 x 8192 matrix
With AO to the Phi, ZGEMM takes 12.5s, but ZGEMM3M is not accelerated by AO at all.
Of course I suppose one could switch from zgemm3m to zgemm if I could detect if AO was enabled, but there does not seem to be an API for that. mkl_mic_enable(),.mkl_mic_disable() exist, but no mkl_mic_isenabled() ?