Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7014 Discussions

Memory leak, mkl::dft on Intel GPU, only on Windows?

NickKMPQ
Beginner
1,453 Views

Hi, I've been running into a memory leak when doing DFTs using the oneAPI SYCL version of MKL's DFT. I pasted below a simplified program that will reproduce it. Basically, if I use a GPU/default selector to run it, e.g. on the Xe graphics of my laptop, under Windows, it will be occupying 4 GB once it finishes after a few seconds. Running on a cpu_selector_v, this doesn't happen. If I run the same code under Linux, there is no apparent leak on GPU, as well. 

The memory use builds up over repeated calls to oneapi::mkl::dft::compute_forward()

Does anyone else see this behavior? Is there something I'm not doing right here?

This is with oneAPI 2023.2.

 

#include <sycl/sycl.hpp>
#include <oneapi/mkl/dfti.hpp>
#include <complex>
int main() {
const int Ntime = 1024;
const int Nfreq = Ntime / 2 + 1;
const int NrepetitionsFFTs = 1048576/2;
 
//create queue (default selector in my case will be Xe graphics GPU)
sycl::queue q{ sycl::default_selector_v, sycl::property::queue::in_order() };
 
//allocate device memory and zero it
float* deviceA = reinterpret_cast<float*>(
sycl::malloc_device(
Ntime * sizeof(float),
q.get_device(),
q.get_context()));
std::complex<float>* deviceB = reinterpret_cast<std::complex<float>*>(
sycl::malloc_device(
Nfreq * sizeof(std::complex<float>),
q.get_device(),
q.get_context()));
q.memset(deviceA, 0, Ntime * sizeof(float)).wait();
 
//Create DFT descriptor
auto fftDescriptorForward =
oneapi::mkl::dft::descriptor<oneapi::mkl::dft::precision::SINGLE, oneapi::mkl::dft::domain::REAL>(Ntime);
fftDescriptorForward.set_value(
oneapi::mkl::dft::config_param::PLACEMENT, DFTI_CONFIG_VALUE::DFTI_NOT_INPLACE);
fftDescriptorForward.commit(q);
//repeat fft NrepetitionsFFTs times
for (int j = 0; j < NrepetitionsFFTs; j++) {
oneapi::mkl::dft::compute_forward(fftDescriptorForward, deviceA, deviceB);
}
q.wait();
sycl::free(deviceA, q);
sycl::free(deviceB, q);
#ifdef __linux__
system("read");
#else
system("pause");
#endif
return 0;
}

 

0 Kudos
8 Replies
VarshaS_Intel
Moderator
1,409 Views

Hi,

 

Thanks for posting in Intel Communities

 

Thanks for providing the details. When we tried the sample reproducer code provided by you, we saw that the memory utilization is normal, but the time to run is more in Windows compared to Linux.

Please find the below details of the machine where we are running the code:

CPU - Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz 3.0 [2023.15.3.0.20_160000]

Intel(R) OpenCL HD Graphics, Intel(R) UHD Graphics 620 3.0 [31.0.101.2111]

 

>>it will be occupying 4 GB once it finishes after a few seconds. 

Could you please elaborate on your issue and provide us with all the observations from your side on GPU to investigate more from our end? 

Besides the memory leak, could you please let us know if you observed any crashes while running the sample code on GPU?

 

Thanks & Regards,

Varsha

 

0 Kudos
NickKMPQ
Beginner
1,397 Views

Thanks for the response. I have tried the same compiled binary on two laptops now, and interestingly it only happens on the newer one:

 

Exhibits memory leak:

12th Gen Intel(R) Core(TM) i7-1270P

Intel(R) Iris(R) Xe Graphics

 

Does not exhibit leak:

Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Intel(R) Iris(R) Graphics 550

 

I attached screenshots of Task Manager while the test runs on the two machines. You can see the continuous build-up of memory use only on the Xe Graphics model, and if I boot that one into Linux and compile/run the code there, there is no leak. It seems to be specific to Windows and this GPU.

 

I tried rolling back the graphics driver, but this didn't make a difference. I'm currenly using the newest driver, 31.0.101.4577

 

If I increase the number of repetitions such that it fills the memory, it will crash, with the message "Abort was called at 268 line in file:"

0 Kudos
VarshaS_Intel
Moderator
1,327 Views

Hi,

 

We are working on your issue. We will get back to you soon.

And also, could you please let us know how much is the Shared GPU memory while the utilization is 0% on Intel(R) Iris(R) Xe Graphics?

 

Thanks & Regards,

Varsha

 

0 Kudos
NickKMPQ
Beginner
1,298 Views

Hello,

thanks for looking into it - on this system, the baseline with everything closed is 0.3/7.8 GB - is that what you are looking for?

Best,

Nick

0 Kudos
VarshaS_Intel
Moderator
1,238 Views

Hi,


Thanks for your reply.


>>is that what you are looking for?

Yes, this is what I am looking for.


We are working on your issue, we will get back to you soon.


Thanks & Regards,

Varsha


0 Kudos
VarshaS_Intel
Moderator
1,143 Views

Hi,


Apologies for the delay in my response and Thanks for your patience.


When we tried on the Intel Iris Xe Graphics, we did not find any high utilization of shared memory issues. At our end, we are using the driver version 31.0.101.4826.


Could you please try to upgrade your drivers then run the code and let us know if you are still facing more utilization of shared memory issues?


Thanks & Regards,

Varsha


0 Kudos
VarshaS_Intel
Moderator
1,108 Views

Hi,


We have not heard back from you. Could you please provide us with an update on the issue?


Thanks & Regards,

Varsha


0 Kudos
VarshaS_Intel
Moderator
1,028 Views

Hi,


We have not heard back from you. Could you please try updating your driver and let us know if you have issue?


Thanks & Regards,

Varsha


0 Kudos
Reply