Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Overwritting input matrix in gemm

ddavobsc
Beginner
195 Views

Hello, I am trying to use gemm to make some complex computations, and sometimes, if the result matrix is one of the input matrices, it gives an incorrect result.

 

I haven't seen any mention of this in the documentation:

// Linked to Intel MKL
#include <cblas.h>

// Lets say we have 3 square matrices of size `size`
// This works
cblas_tgemm<T>(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, C, size);

// This also seems to work
cblas_tgemm<T>(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, A, size);

// But this gives incorrect results
cblas_tgemm<T>(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, B, size);
0 Kudos
1 Solution
Ethan_F_Intel
Moderator
78 Views

To start off, I want to mention that this is not how the function is intended to be used so any incorrect results are to be expected. Using cblas_?gemm this way is not supported. The output matrix should not be the same as the input matrix. This is because they all can technically have different dimensions. A is an m by k matrix, B is a k by n matrix, and C is an m by n matrix. The only situation where using the input matrix to store the output might technically pass is when they all are the same size. Even still, in my investigations I could not find a situation where [A = A*B] worked but [B = A*B] didn't. Either both work, or neither do. Here's an example:

C <- 1.0 * ( A * B ) + 0.0 * C

A <- 1.0 * ( A * B ) + 0.0 * A

B <- 1.0 * ( A * B ) + 0.0 * B

Amxk
0.840188 0.394383 0.783099 0.79844
0.911647 0.197551 0.335223 0.76823
0.277775 0.55397 0.477397 0.628871
0.364784 0.513401 0.95223 0.916195

Bkxn
0.635712 0.717297 0.141603 0.606969
0.0163006 0.242887 0.137232 0.804177
0.156679 0.400944 0.12979 0.108809
0.998925 0.218257 0.512932 0.839112

Cmxn
1.46082 1.1867 0.684279 1.58231
1.40269 1.00398 0.59376 1.39331
0.888607 0.662464 0.499886 1.19373
1.30467 0.968114 0.715646 1.50668

Amxk
0.840188 0.394383 0.783099 0.79844
0.911647 0.197551 0.335223 0.76823
0.277775 0.55397 0.477397 0.628871
0.364784 0.513401 0.95223 0.916195

Bkxn
0.635712 0.717297 0.141603 0.606969
0.0163006 0.242887 0.137232 0.804177
0.156679 0.400944 0.12979 0.108809
0.998925 0.218257 0.512932 0.839112

A (result)
1.46082 1.1867 0.684279 1.58231
1.40269 1.00398 0.59376 1.39331
0.888607 0.662464 0.499886 1.19373
1.30467 0.968114 0.715646 1.50668

Amxk
0.840188 0.394383 0.783099 0.79844
0.911647 0.197551 0.335223 0.76823
0.277775 0.55397 0.477397 0.628871
0.364784 0.513401 0.95223 0.916195

Bkxn
0.635712 0.717297 0.141603 0.606969
0.0163006 0.242887 0.137232 0.804177
0.156679 0.400944 0.12979 0.108809
0.998925 0.218257 0.512932 0.839112

B (result)
1.46082 1.1867 0.684279 1.58231
1.40269 1.00398 0.59376 1.39331
0.888607 0.662464 0.499886 1.19373
1.30467 0.968114 0.715646 1.50668


It made no difference whether the result was stored in A or B. It worked either way with correct results. While this may have succeeded for this small example, Intel MKL does not claim to support this behavior. Once the matrices get larger in size (like 256x256 on my machine) the result becomes incorrect.

View solution in original post

0 Kudos
1 Reply
Ethan_F_Intel
Moderator
79 Views

To start off, I want to mention that this is not how the function is intended to be used so any incorrect results are to be expected. Using cblas_?gemm this way is not supported. The output matrix should not be the same as the input matrix. This is because they all can technically have different dimensions. A is an m by k matrix, B is a k by n matrix, and C is an m by n matrix. The only situation where using the input matrix to store the output might technically pass is when they all are the same size. Even still, in my investigations I could not find a situation where [A = A*B] worked but [B = A*B] didn't. Either both work, or neither do. Here's an example:

C <- 1.0 * ( A * B ) + 0.0 * C

A <- 1.0 * ( A * B ) + 0.0 * A

B <- 1.0 * ( A * B ) + 0.0 * B

Amxk
0.840188 0.394383 0.783099 0.79844
0.911647 0.197551 0.335223 0.76823
0.277775 0.55397 0.477397 0.628871
0.364784 0.513401 0.95223 0.916195

Bkxn
0.635712 0.717297 0.141603 0.606969
0.0163006 0.242887 0.137232 0.804177
0.156679 0.400944 0.12979 0.108809
0.998925 0.218257 0.512932 0.839112

Cmxn
1.46082 1.1867 0.684279 1.58231
1.40269 1.00398 0.59376 1.39331
0.888607 0.662464 0.499886 1.19373
1.30467 0.968114 0.715646 1.50668

Amxk
0.840188 0.394383 0.783099 0.79844
0.911647 0.197551 0.335223 0.76823
0.277775 0.55397 0.477397 0.628871
0.364784 0.513401 0.95223 0.916195

Bkxn
0.635712 0.717297 0.141603 0.606969
0.0163006 0.242887 0.137232 0.804177
0.156679 0.400944 0.12979 0.108809
0.998925 0.218257 0.512932 0.839112

A (result)
1.46082 1.1867 0.684279 1.58231
1.40269 1.00398 0.59376 1.39331
0.888607 0.662464 0.499886 1.19373
1.30467 0.968114 0.715646 1.50668

Amxk
0.840188 0.394383 0.783099 0.79844
0.911647 0.197551 0.335223 0.76823
0.277775 0.55397 0.477397 0.628871
0.364784 0.513401 0.95223 0.916195

Bkxn
0.635712 0.717297 0.141603 0.606969
0.0163006 0.242887 0.137232 0.804177
0.156679 0.400944 0.12979 0.108809
0.998925 0.218257 0.512932 0.839112

B (result)
1.46082 1.1867 0.684279 1.58231
1.40269 1.00398 0.59376 1.39331
0.888607 0.662464 0.499886 1.19373
1.30467 0.968114 0.715646 1.50668


It made no difference whether the result was stored in A or B. It worked either way with correct results. While this may have succeeded for this small example, Intel MKL does not claim to support this behavior. Once the matrices get larger in size (like 256x256 on my machine) the result becomes incorrect.

0 Kudos
Reply