memory leak in oneapi::mkl::blas::row_major::gemm

siquus · ‎01-08-2021

My System:

Ubuntu 20.04.1 LTS (Focal Fossa)

Intel(R) oneAPI DPC++ Compiler 2021.1.2 (2020.10.0.1214)

MKL: 2021.1.1

My program is performing oneapi::mkl::blas::row_major::gemm in loop, that is a function of the following form is called repeatedly with very large matrices and massively leaking memory (16Gig after a few seconds = 50 calls or so).

// C = A * B, assumes row-major matrices with B being transposed already
// Inputs = {A, B'}
int MatrixMul(
   sycl::queue * queue,
   std::shared_ptr<sycl::buffer<float, 1>> &output,
   const std::vector<std::shared_ptr<sycl::buffer<float, 1>>> &inputs,
   size_t M, size_t N, size_t K)
{
   output = std::make_shared<sycl::buffer<float, 1>>(M * N);
   oneapi::mkl::blas::row_major::gemm(
      *queue,
      oneapi::mkl::transpose::nontrans, oneapi::mkl::transpose::trans,
      M, N, K, 1.f,
      *inputs[0], K, *inputs[1], K,
      0.f, *output, N);

   return 0;
}

I tried calling "mkl_free_buffers()" / "mkl_disable_fast_mm()", or setting MKL_DISABLE_FAST_MM=1, all to no avail.
I have replaced the call to gemm with my own sycl::handler performing naive gemm and it works just fine, not leaking any memory (albeit at half the speed -.-)

What am I missing?

Thanks,

Patrik

siquus · ‎01-08-2021

Device: Intel(R) Core(TM) i7-8750H CPU @ 2.20GH

RahulV_intel · ‎01-12-2021

Hi,

Thanks for reporting this issue.

Not sure if oneMKL gemm API is causing the memory leak. I will try to reproduce this issue in my environment and get back to you with the update.

Regards,

Rahul

RahulV_intel · ‎01-17-2021

Hi,

Could you please share your complete reproducer code (including your input matrices and function calls)?

Thanks,

Rahul

siquus · ‎01-18-2021

Hi,

as I am not allowed to share my code, I tried writing sample programs which would reproduce the problem, but couldn't. Then I ran my original program and the problem was gone as well and I couldn't reproduce it. I am not aware of any library updates on my machine affecting the toolchain. I'll take a snapshot if the problem reappears.

As for my original question: I did read that with MKL one needs to call mkl_free_buffers() every now and then in a long-running program as MKL would not free unused buffers by itself. Seeing that my program and the test programs I wrote do not have this issue, I assume this does not hold for the dpcpp-runtime oneMKL?

Thanks!

RahulV_intel · ‎01-25-2021

Hi,

The DPCPP runtime does seem to free the MKL buffers automatically. However, for the sake of confirmation, I have forwarded this question to the MKL experts. They will get back to you on this.

Thanks,

Rahul

MRajesh_intel · ‎06-15-2021

Hi Patrik,

When you use oneMKL with buffers, MKL buffers will automatically be freed when they are out of scope.

Please let us know if this helps and if it does can we proceed further to close the thread?

Regards

Rajesh.

MRajesh_intel · ‎06-21-2021

Hi,

Since we didn't hear back from you. We are closing this thread for now. If you require any further assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Have a Good day.

Regards

Rajesh.