- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am writing an application that makes use of multiple GPUs.
I currently have 4 units of Intel Arc a750 in a rig.
But the problem is reproducible on on my dev desktop having only 1 GPU.
I haven't found a convenient way to select the GPUs. This is how I do.
I call platform::get_platforms()
Select "Intel(R) oneAPI Unified Runtime over Level-Zero" platform
Then call this_platform.get_devices(info::device_type::gpu)
And use the devices when instatiating the queues
queue q(compute_device, handle_async_error);
constexpr unsigned int BLOCK_SIZE = 16 * 1024 * 1024 * 4 * 2;
char* gpu_data1 = malloc_shared(BLOCK_SIZE, q);
char* gpu_data2 = malloc_shared(BLOCK_SIZE, q);
char* gpu_data3 = malloc_shared(BLOCK_SIZE, q);
char* gpu_data4 = malloc_shared(BLOCK_SIZE, q);
event memEvent2 = q.prefetch(gpu_data1, BLOCK_SIZE); // only gpu_data1
q.submit([&](handler& h) {
h.depends_on(memEvent2);
h.parallel_for(range{ len }, [=](id<1> i) {
// GPU_DATA1 Available
}
}
GPU_DATA2, GPU_DATA3, GPU_DATA4, are still accessible from the host if using "gpu_selector_v" but if I chose a device via the platform.get_devices()
all of the shared mem (gpu_data 1-4) are unavailable from the host. It seems as the prefetch, prefetched too much. Memory pages are protected (in use by the GPU). I have assured that the device is of the same platform "Intel(R) oneAPI Unified Runtime over Level-Zero".
I don't understand why there is a difference between gpu_selector_v and get_devices if it chooses the same device. Either I dont understand prefetch or there might be a bug.
I worked around it using malloc_host and malloc_device, and manually copying the data.
event memEvent2 = q.memcpy(GpuData, HostData, BLOCK_SIZE);
h.depends_on(memEvent2);
Works as a charm.
I am using DPC++ 2025 on Windows
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page