- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a problem with allocation.
I need to allocate on GPU a matrix whose total size is greater then the max allocable size in a single buffer.
I solved by allocating the matrix by means of malloc_shared() function but I was wondering if it is possible
to do the same with sycl buffers.
- Tags:
- General Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you specify your GPU device, OS, oneAPI toolkit version that you are currently working on?
--Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm working on Intel Devcloud.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I assume that your GPU device is an iGPU(Integrated GPU).
A point to note here is that iGPU shares most of its memory with the host device.
As per OpenCL/SYCL standard, memory allocation on a device cannot exceed its maximum allocatable memory. Since buffer/accessor memory allocation is a part of SYCL standard, memory allocation cannot exceed device's maximum allocatable memory for a single data structure.
Intel has added its own extensions on top of SYCL, known as "Unified shared memory(USM)". Using USM(malloc_shared()), the actual memory allocation takes place on the host and is shared between the host and the device. Hence it is possible to allocate memory that exceeds device's maximum allocatable memory(with limit being total available host memory).
Here's a definition from the DPC++ book for shared allocations:
Shared allocations are allocations that are accessible on both the host and the device. In this regard they are very similar to host allocations, but they differ in that data can now migrate between host memory and device-local memory. This means that accesses on a device, after the migration has occurred, happen from much faster device local memory instead of remotely accessing host memory. Typically, this is accomplished through mechanisms inside the DPC++ runtime and lower-level drivers that are mostly hidden from the programmer.
Hope this helps.
--Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you Let me know if the explanation provided helps?
--Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you let me know if I can close the thread since you have accepted the solution?
--Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have not heard back from you for a while, so I will close this thread. Post a new question if you still have issues.
--Rahul

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page