- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I wrote a short code to call sgemm_pack to speed up. But the result is not consistent with cblas_sgemm.
For example,
Matrix A (2 x 2): [1.0, 2.0, 3.0, 4.0]
Matrix B (2 x 1): [1.0, 2.0]
With the row major, Matrix C (2 x 1) = A * B = [5, 11]. But with sgemm_pack + sgemm_compute, the result is [0.0, 0.0].
Could you please take a look. Any advice is welcomed.
Thanks
---
Environments: I use parallel studio xe. the version is 2017.1.132.
Build command: icc gemm_pack.c -I${MKLROOT}/include -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl -std=c99
---
The sample code,
#include <stdio.h>
#include <mkl.h>
void print(float* a, int length, const char* name)
{
int i = 0;
for (i = 0; i < length; i++) {
printf("%s[%d] = %f\n", name, i, a);
}
}
int main(void)
{
int m = 2;
int n = 1;
int k = 2;
float *a, *b, *c;
a = (float*)malloc(sizeof(float) * m * k);
b = (float*)malloc(sizeof(float) * k * n);
c = (float*)malloc(sizeof(float) * m * n);
int i = 0;
for (i = 0; i < m *k; i++) {
a = i + 1;
}
for (i = 0; i < k * n; i++) {
b = i + 1;
}
float alpha = 1.0f;
float beta = 0.0f;
int lda = k;
int ldb = n;
int ldc = n;
printf("========================SGEMM_PACK========================\n");
print(a, m * k, "a");
print(b, k * n, "b");
float *packA = cblas_sgemm_alloc(CblasAMatrix, m, n, k);
cblas_sgemm_pack(CblasRowMajor, CblasAMatrix, CblasNoTrans, m, n, k, alpha, a, lda, packA);
cblas_sgemm_compute(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, packA, lda, b, ldb, beta, c, ldc);
cblas_sgemm_free(packA);
print(c, m * n, "c");
printf("========================SGEMM========================\n");
print(a, m * k, "a");
print(b, k * n, "b");
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);
print(c, m * n, "c");
return 0;
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Since A is already packed, please specify CblasPacked instead of CblasNoTrans.
cblas_sgemm_compute(CblasRowMajor, CblasPacked, CblasNoTrans, m, n, k, packA, lda, b, ldb, beta, c, ldc);
Thanks.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page