Overwritting input matrix in gemm

ddavobsc — Fri, 04 Apr 2025 14:43:00 GMT

Hello, I am trying to use gemm to make some complex computations, and sometimes, if the result matrix is one of the input matrices, it gives an incorrect result.

I haven't seen any mention of this in the documentation:

// Linked to Intel MKL #include <cblas.h> // Lets say we have 3 square matrices of size `size` // This works cblas_tgemm<T>(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, C, size); // This also seems to work cblas_tgemm<T>(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, A, size); // But this gives incorrect results cblas_tgemm<T>(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, B, size);

Re: Overwritting input matrix in gemm

Ethan_F_Intel — Mon, 14 Apr 2025 16:03:50 GMT

To start off, I want to mention that this is not how the function is intended to be used so any incorrect results are to be expected. Using cblas_?gemm this way is not supported. The output matrix should not be the same as the input matrix. This is because they all can technically have different dimensions. A is an m by k matrix, B is a k by n matrix, and C is an m by n matrix. The only situation where using the input matrix to store the output might technically pass is when they all are the same size. Even still, in my investigations I could not find a situation where [A = A*B] worked but [B = A*B] didn't. Either both work, or neither do. Here's an example:

C <- 1.0 * ( A * B ) + 0.0 * C

A <- 1.0 * ( A * B ) + 0.0 * A

B <- 1.0 * ( A * B ) + 0.0 * B

Amxk
0.840188 0.394383 0.783099 0.79844
0.911647 0.197551 0.335223 0.76823
0.277775 0.55397 0.477397 0.628871
0.364784 0.513401 0.95223 0.916195

Bkxn
0.635712 0.717297 0.141603 0.606969
0.0163006 0.242887 0.137232 0.804177
0.156679 0.400944 0.12979 0.108809
0.998925 0.218257 0.512932 0.839112

Cmxn
1.46082 1.1867 0.684279 1.58231
1.40269 1.00398 0.59376 1.39331
0.888607 0.662464 0.499886 1.19373
1.30467 0.968114 0.715646 1.50668

Amxk
0.840188 0.394383 0.783099 0.79844
0.911647 0.197551 0.335223 0.76823
0.277775 0.55397 0.477397 0.628871
0.364784 0.513401 0.95223 0.916195

Bkxn
0.635712 0.717297 0.141603 0.606969
0.0163006 0.242887 0.137232 0.804177
0.156679 0.400944 0.12979 0.108809
0.998925 0.218257 0.512932 0.839112

A (result)
1.46082 1.1867 0.684279 1.58231
1.40269 1.00398 0.59376 1.39331
0.888607 0.662464 0.499886 1.19373
1.30467 0.968114 0.715646 1.50668

Amxk
0.840188 0.394383 0.783099 0.79844
0.911647 0.197551 0.335223 0.76823
0.277775 0.55397 0.477397 0.628871
0.364784 0.513401 0.95223 0.916195

Bkxn
0.635712 0.717297 0.141603 0.606969
0.0163006 0.242887 0.137232 0.804177
0.156679 0.400944 0.12979 0.108809
0.998925 0.218257 0.512932 0.839112

B (result)
1.46082 1.1867 0.684279 1.58231
1.40269 1.00398 0.59376 1.39331
0.888607 0.662464 0.499886 1.19373
1.30467 0.968114 0.715646 1.50668

It made no difference whether the result was stored in A or B. It worked either way with correct results. While this may have succeeded for this small example, Intel MKL does not claim to support this behavior. Once the matrices get larger in size (like 256x256 on my machine) the result becomes incorrect.

topic Overwritting input matrix in gemm in Intel® oneAPI Math Kernel Library

Overwritting input matrix in gemm

Re: Overwritting input matrix in gemm