Hello,

I wrote a short code to call sgemm_pack to speed up. But the result is not consistent with cblas_sgemm.

For example,

Matrix A (2 x 2): [1.0, 2.0, 3.0, 4.0]

Matrix B (2 x 1): [1.0, 2.0]

With the row major, Matrix C (2 x 1) = A * B = [5, 11]. But with sgemm_pack + sgemm_compute, the result is [0.0, 0.0].

Could you please take a look. Any advice is welcomed.

Thanks

---

Environments: I use parallel studio xe. the version is 2017.1.132.

Build command: icc gemm_pack.c -I${MKLROOT}/include -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl -std=c99

---

The sample code,

#include <stdio.h> #include <mkl.h> void print(float* a, int length, const char* name) { int i = 0; for (i = 0; i < length; i++) { printf("%s[%d] = %f\n", name, i, a); } } int main(void) { int m = 2; int n = 1; int k = 2; float *a, *b, *c; a = (float*)malloc(sizeof(float) * m * k); b = (float*)malloc(sizeof(float) * k * n); c = (float*)malloc(sizeof(float) * m * n); int i = 0; for (i = 0; i < m *k; i++) { a= i + 1; } for (i = 0; i < k * n; i++) { b= i + 1; } float alpha = 1.0f; float beta = 0.0f; int lda = k; int ldb = n; int ldc = n; printf("========================SGEMM_PACK========================\n"); print(a, m * k, "a"); print(b, k * n, "b"); float *packA = cblas_sgemm_alloc(CblasAMatrix, m, n, k); cblas_sgemm_pack(CblasRowMajor, CblasAMatrix, CblasNoTrans, m, n, k, alpha, a, lda, packA); cblas_sgemm_compute(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, packA, lda, b, ldb, beta, c, ldc); cblas_sgemm_free(packA); print(c, m * n, "c"); printf("========================SGEMM========================\n"); print(a, m * k, "a"); print(b, k * n, "b"); cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc); print(c, m * n, "c"); return 0; }

Hello,

Since A is already packed, please specify CblasPacked instead of CblasNoTrans.

cblas_sgemm_compute(CblasRowMajor, CblasPacked, CblasNoTrans, m, n, k, packA, lda, b, ldb, beta, c, ldc);

Thanks.

