I'm interested in further optimizing my application using the packed GEMM API. However, I'm unclear how it behaves in the case of dynamic batch sizes. For example,
- X, the input of shape [M, K] where M is the batch size
- W, the weight of shape [N, K]
The GEMM function should compute X*WT where W can be packed as it remains constant.
How does a change in M affect the packed representation of W? Do cblas_gemm_*_compute functions silently repack W if any of M, N, K is different? Or should it be done manually?