Intel® oneAPI Base Toolkit
Support for the core tools and libraries within the base toolkit that are used to build and deploy high-performance data-centric applications.

Sycl buffers

eug
Beginner
1,559 Views

Hello,

I have a problem with allocation.

I need to allocate on GPU a matrix whose total size is greater then the max allocable size in a single buffer. 

I solved by allocating the matrix by means of malloc_shared() function but I was wondering if it is possible

to do the same with sycl buffers.

 

 

0 Kudos
1 Solution
RahulV_intel
Moderator
1,551 Views

Hi,

 

Could you Let me know if the explanation provided helps?

 

--Rahul

 

View solution in original post

0 Kudos
6 Replies
RahulV_intel
Moderator
1,559 Views

Hi,

Could you specify your GPU device, OS, oneAPI toolkit version that you are currently working on?

 

--Rahul

0 Kudos
eug
Beginner
1,559 Views

Hi,

I'm working on Intel Devcloud.

0 Kudos
RahulV_intel
Moderator
1,559 Views

Hi,

I assume that your GPU device is an iGPU(Integrated GPU).

A point to note here is that iGPU shares most of its memory with the host device.

As per OpenCL/SYCL standard, memory allocation on a device cannot exceed its maximum allocatable memory. Since buffer/accessor memory allocation is a part of SYCL standard, memory allocation cannot exceed device's maximum allocatable memory for a single data structure.

Intel has added its own extensions on top of SYCL, known as "Unified shared memory(USM)". Using USM(malloc_shared()), the actual memory allocation takes place on the host and is shared between the host and the device. Hence it is possible to allocate memory that exceeds device's maximum allocatable memory(with limit being total available host memory).

Here's a definition from the DPC++ book for shared allocations:

Shared allocations are allocations that are accessible on both the host and the device. In this regard they are very similar to host allocations, but they differ in that data can now migrate between host memory and device-local memory. This means that accesses on a device, after the migration has occurred, happen from much faster device local memory instead of remotely accessing host memory. Typically, this is accomplished through mechanisms inside the DPC++ runtime and lower-level drivers that are mostly hidden from the programmer.
 

Hope this helps.

 

--Rahul

0 Kudos
RahulV_intel
Moderator
1,552 Views

Hi,

 

Could you Let me know if the explanation provided helps?

 

--Rahul

 

0 Kudos
RahulV_intel
Moderator
1,530 Views

Hi,


Could you let me know if I can close the thread since you have accepted the solution?


--Rahul


0 Kudos
RahulV_intel
Moderator
1,507 Views

Hi,


I have not heard back from you for a while, so I will close this thread. Post a new question if you still have issues.


--Rahul


0 Kudos
Reply