Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and GDB*

using multiple malloc_device() cause error



N is 512 * 32768

double *data_device = malloc_device<double>(N * 7, q); q.wait();

when using malloc_device(), if we malloc space larger than N * 7, it would cause error

terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Native API failed. Native API returns: -5 (CL_OUT_OF_RESOURCES) -5 (CL_OUT_OF_RESOURCES)
Aborted (core dumped)


our device is Intel ATS-P 2Tail GPU with oneAPI

dpcpp --version
Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.1.0.20220316)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2022.1.0/linux/bin-llvm


the source code is simplized as :

if there is no parallel for, it would not cause error

#include <CL/sycl.hpp>
#include <vector>
using namespace sycl;
using namespace std;

static const int N = 512 * 32768;

int main() {
  queue q;
  std::cout << "Device : " << q.get_device().get_info<info::device::name>() << "\n";

  //# initialize data on host
  double *data = static_cast<double *>(malloc(N * sizeof(double)));
  for (int i = 0; i < N; i++) data[i] = i;

  //# Explicit USM allocation using malloc_device
  double *data_device = malloc_device<double>(N * 8, q); q.wait();

  //# update device memory
  q.parallel_for(range<1>(N), [=](id<1> i) { data_device[i] = 2.0; }).wait();

  //# copy mem from device to host
  // q.memcpy(data, data_device, sizeof(double) * N).wait();

  //# print output
  // for (int i = 0; i < N; i++) std::cout << data[i] << "\n";
  free(data_device, q);
  return 0;


0 Kudos
4 Replies


Thanks for reaching out to us.

We are working on your issue. We will get back to you soon.

Thanks & Regards,


0 Kudos

I cannot reproduce your issue on my side. However, according to SYCL spec 2020, there's a relationship between how much memory can be allocated during a single allocation and the global_mem_size:


info::device:: max_mem_alloc_size returns the maximum size of memory object allocation in bytes. The minimum value is max (1/4th of info::device::global_mem_size ,128*1024*1024) if this SYCL device is not of device type info::device_type::custom.


May I also know your machine information and GPU driver version? 




0 Kudos

sorry for replying late, It is a competition platform and ther are few information about it. But I could provide `sycl-ls` info.

As I know, It's xe_hp_sdv GPU.

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz 3.0 [2022.]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x020a] 3.0 [22.18.023111]
[opencl:gpu:3] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x020a] 3.0 [22.18.023111]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x020a] 1.3 [1.3.23111]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x020a] 1.3 [1.3.23111]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

The oneAPI (compiler) version is 2022.1.0. But I do not know how to check the driver version.

It seems that this is because memory leak, but there are no process on the GPU and we have no tools to confim GPU memory usage.

And the driver seems not stable enough.


Any way, it's a nice product but it's a pity that I am not able to fully utilize it. Programming with SYCL is amazing. Could you provide some doc or any resources to write wonderful program for Xe GPU, please.



0 Kudos

Since there's no further update, I'm closing this issue from our site, but please feel free to discuss the issue with other people in the forum, or open another new topic if necessary.

0 Kudos