Intel® oneAPI Base Toolkit
Support for the core tools and libraries within the base toolkit that are used to build and deploy high-performance data-centric applications.

Sycl buffers

eug
Beginner
1,549 Views

Hello,

I have a problem with allocation.

I need to allocate on GPU a matrix whose total size is greater then the max allocable size in a single buffer. 

I solved by allocating the matrix by means of malloc_shared() function but I was wondering if it is possible

to do the same with sycl buffers.

 

 

0 Kudos
1 Solution
RahulV_intel
Moderator
1,541 Views

Hi,

 

Could you Let me know if the explanation provided helps?

 

--Rahul

 

View solution in original post

0 Kudos
6 Replies
RahulV_intel
Moderator
1,549 Views

Hi,

Could you specify your GPU device, OS, oneAPI toolkit version that you are currently working on?

 

--Rahul

0 Kudos
eug
Beginner
1,549 Views

Hi,

I'm working on Intel Devcloud.

0 Kudos
RahulV_intel
Moderator
1,549 Views

Hi,

I assume that your GPU device is an iGPU(Integrated GPU).

A point to note here is that iGPU shares most of its memory with the host device.

As per OpenCL/SYCL standard, memory allocation on a device cannot exceed its maximum allocatable memory. Since buffer/accessor memory allocation is a part of SYCL standard, memory allocation cannot exceed device's maximum allocatable memory for a single data structure.

Intel has added its own extensions on top of SYCL, known as "Unified shared memory(USM)". Using USM(malloc_shared()), the actual memory allocation takes place on the host and is shared between the host and the device. Hence it is possible to allocate memory that exceeds device's maximum allocatable memory(with limit being total available host memory).

Here's a definition from the DPC++ book for shared allocations:

Shared allocations are allocations that are accessible on both the host and the device. In this regard they are very similar to host allocations, but they differ in that data can now migrate between host memory and device-local memory. This means that accesses on a device, after the migration has occurred, happen from much faster device local memory instead of remotely accessing host memory. Typically, this is accomplished through mechanisms inside the DPC++ runtime and lower-level drivers that are mostly hidden from the programmer.
 

Hope this helps.

 

--Rahul

0 Kudos
RahulV_intel
Moderator
1,542 Views

Hi,

 

Could you Let me know if the explanation provided helps?

 

--Rahul

 

0 Kudos
RahulV_intel
Moderator
1,520 Views

Hi,


Could you let me know if I can close the thread since you have accepted the solution?


--Rahul


0 Kudos
RahulV_intel
Moderator
1,497 Views

Hi,


I have not heard back from you for a while, so I will close this thread. Post a new question if you still have issues.


--Rahul


0 Kudos
Reply