Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
713 Discussions

Possible Bug with shared data and queue.prefetch and Platform.getDevices()

mhogstrom
Beginner
182 Views

I am writing an application that makes use of multiple GPUs.
I currently have 4 units of Intel Arc a750 in a rig.
But the problem is reproducible on on my dev desktop having only 1 GPU.

I haven't found a convenient way to select the GPUs. This is how I do.
I call platform::get_platforms()
Select "Intel(R) oneAPI Unified Runtime over Level-Zero" platform
Then call this_platform.get_devices(info::device_type::gpu)
And use the devices when instatiating the queues

queue q(compute_device, handle_async_error); 

constexpr unsigned int BLOCK_SIZE = 16 * 1024 * 1024 * 4 * 2;
char* gpu_data1 = malloc_shared(BLOCK_SIZE, q);
char* gpu_data2 = malloc_shared(BLOCK_SIZE, q);
char* gpu_data3 = malloc_shared(BLOCK_SIZE, q);
char* gpu_data4 = malloc_shared(BLOCK_SIZE, q);

event memEvent2 = q.prefetch(gpu_data1, BLOCK_SIZE); // only gpu_data1

q.submit([&](handler& h) {
h.depends_on(memEvent2);
h.parallel_for(range{ len }, [=](id<1> i) {

// GPU_DATA1 Available

}
}

GPU_DATA2, GPU_DATA3, GPU_DATA4,  are still accessible from the host if using "gpu_selector_v" but if I chose a device via the platform.get_devices()
all of the shared mem (gpu_data 1-4) are unavailable from the host. It seems as the prefetch, prefetched too much. Memory pages are protected (in use by the GPU). I have assured that the device is of the same platform "Intel(R) oneAPI Unified Runtime over Level-Zero". 

I don't understand why there is a difference between gpu_selector_v  and get_devices if it chooses the same device. Either I dont understand prefetch or there might be a bug.

I worked around it using malloc_host and malloc_device, and manually copying the data.
event memEvent2 = q.memcpy(GpuData, HostData, BLOCK_SIZE);
h.depends_on(memEvent2);

Works as a charm.

I am using DPC++ 2025 on Windows

0 Kudos
0 Replies
Reply