I'm interested in further optimizing my application using the packed GEMM API. However, I'm unclear how it behaves in the case of dynamic batch sizes. For example,
The GEMM function should compute X*WT where W can be packed as it remains constant.
How does a change in M affect the packed representation of W? Do cblas_gemm_*_compute functions silently repack W if any of M, N, K is different? Or should it be done manually?