- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I am trying to use gemm to make some complex computations, and sometimes, if the result matrix is one of the input matrices, it gives an incorrect result.
I haven't seen any mention of this in the documentation:
// Linked to Intel MKL
#include <cblas.h>
// Lets say we have 3 square matrices of size `size`
// This works
cblas_tgemm<T>(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, C, size);
// This also seems to work
cblas_tgemm<T>(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, A, size);
// But this gives incorrect results
cblas_tgemm<T>(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, B, size);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To start off, I want to mention that this is not how the function is intended to be used so any incorrect results are to be expected. Using cblas_?gemm this way is not supported. The output matrix should not be the same as the input matrix. This is because they all can technically have different dimensions. A is an m by k matrix, B is a k by n matrix, and C is an m by n matrix. The only situation where using the input matrix to store the output might technically pass is when they all are the same size. Even still, in my investigations I could not find a situation where [A = A*B] worked but [B = A*B] didn't. Either both work, or neither do. Here's an example:
C <- 1.0 * ( A * B ) + 0.0 * C |
A <- 1.0 * ( A * B ) + 0.0 * A |
B <- 1.0 * ( A * B ) + 0.0 * B |
Amxk Bkxn Cmxn |
Amxk Bkxn A (result) |
Amxk Bkxn B (result) |
It made no difference whether the result was stored in A or B. It worked either way with correct results. While this may have succeeded for this small example, Intel MKL does not claim to support this behavior. Once the matrices get larger in size (like 256x256 on my machine) the result becomes incorrect.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To start off, I want to mention that this is not how the function is intended to be used so any incorrect results are to be expected. Using cblas_?gemm this way is not supported. The output matrix should not be the same as the input matrix. This is because they all can technically have different dimensions. A is an m by k matrix, B is a k by n matrix, and C is an m by n matrix. The only situation where using the input matrix to store the output might technically pass is when they all are the same size. Even still, in my investigations I could not find a situation where [A = A*B] worked but [B = A*B] didn't. Either both work, or neither do. Here's an example:
C <- 1.0 * ( A * B ) + 0.0 * C |
A <- 1.0 * ( A * B ) + 0.0 * A |
B <- 1.0 * ( A * B ) + 0.0 * B |
Amxk Bkxn Cmxn |
Amxk Bkxn A (result) |
Amxk Bkxn B (result) |
It made no difference whether the result was stored in A or B. It worked either way with correct results. While this may have succeeded for this small example, Intel MKL does not claim to support this behavior. Once the matrices get larger in size (like 256x256 on my machine) the result becomes incorrect.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page