Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Can I pass a subset of a matrix into another function in MKL?

Po
Beginner
381 Views

I am trying to optimize a lot of matrix calculations in MKL that requires me to allocate large blocks of memory using something like :

double* test_matrix = (double*)mkl_malloc(n * sizeof(double), 64).

Recently, I have been finding a lot of memory allocation errors that are popping up - which are hard to replicate and even harder to debug. I am worried that there is some internal header data that MKL puts into the heap that I am not accounting for using my current method.

Is there an "official" way of passing a subset of a MKL matrix into another function? Passing a copy would definitely increase my overhead too much. I am currently giving a reference of to the matrix subset like this:

double* a = (double*)mkl_malloc(4 * 4 * sizeof(double), 64);
double* b = (double*)mkl_malloc(4 * 4 * sizeof(double), 64);
double* c = (double*)mkl_malloc(2 * 2 * sizeof(double), 64);

... fill in values for a and b ...

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 2, 2, 2, 1, &a[2], 4, &b[2], 4, 0, c, 2);
cout << "Result is: " << c[0] << c[1] << c[2] << c[3] << endl;

0 Kudos
5 Replies
Henrik_A_
Beginner
381 Views

I've been using dgemm just fine when passing in subsections of matrices, just make sure the stride matches the real matrix size and the starting offset + height are clipped to the total size.

0 Kudos
Po
Beginner
381 Views

Thanks for your quick response Henrik. Is there any special operations I would need to perform when deallocating memory for the matrix being passed to an internal function?

Also, is there any issues with passing a return matrix pointer to a function for the function to use as output? For example, this is my usual way of doing things (creates some memory errors with increased complexity, not sure of precise source of the bugs):

double* a = (double*)mkl_malloc(2 * 2 * sizeof(double), 64);
double* b = (double*)mkl_malloc(2 * 2 * sizeof(double), 64); 
double* c = (double*)mkl_malloc(8 * 8 * sizeof(double), 64);

for(int i = 0; i < 4; i++) {
  a = 1.0; // could be any value
  b = 1.0; 

}

someOperation(a, b, &c[5]);

mkl_free_buffers();
mkl_free(a);
mkl_free(b);
mkl_free(c); 

0 Kudos
mecej4
Honored Contributor III
381 Views

Po, there are no truly two-dimensional arrays in any of your codes as shown. Although you can certainly use a pointer to a sufficiently large block of memory as a matrix, it is your responsibility to make sure that the code properly maps the conceptual matrix to a one-dimensional array. Therefore, your questions have no answers yet. For example, what do you expect the "matrix" c to contain after the call to someOperation()? How was c allocated, and how do you intend to access it in subsequent code? What does someOperation() expect as arguments, how does it declare the formal arguments, and how are the arrays used inside the function?

0 Kudos
Zhang_Z_Intel
Employee
381 Views

To improve performance of Intel MKL, the memory allocator uses per-thread memory pools where buffers may be collected for fast reuse. The mkl_free_buffers() function can be used to free unused memory. You should call mkl_free_buffers() after the last call to Intel MKL functions. In large applications, if you suspect that the memory may get insufficient, you may call this function earlier, but anticipate a drop in performance that may occur due to reallocation of buffers for subsequent calls to Intel MKL functions.

If this does not solve your memory allocation problems, then you can also try setting the MKL_DISABLE_FAST_MM environment variable to 1 or call the mkl_disable_fast_mm() function. This makes MKL not to use memory pools for fast buffer allocation/de-allocation. But be aware that this change may negatively impact performance of some Intel MKL functions, especially for small problem sizes.

0 Kudos
SergeyKostrov
Valued Contributor II
381 Views
>>...Recently, I have been finding a lot of memory allocation errors that are popping up - which are hard to replicate and >>even harder to debug... Please provide a complete reproducer of all these "memory" errors. I don't see any memory related errors in my codes related to cblas_dgemm function.
0 Kudos
Reply