- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a C++ code which I am trying to port to DPC++. There are many function calls inside the main function.
My first approach was to create buffers inside each of those functions. In that scenario, the buffers were destroyed after the function was executed. The problem with this approach is that there is a specific function which might be called 4000 times and I would prefer to keep the created buffers throughout the runtime. That's why I decided to create the buffers inside the main function and pass the buffers as inputs to each function. The problem with this approach is that I need to sync the host and the device after each function call (it doesn't give me the correct answer otherwise). host_accessor() is sometimes helpful but most times doesn't sync. I know I can't use q.wait() in this design either. Could you please guide me the best synchronization method in DPC++.
Thank you in advance.
Leila
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Since you have mentioned >> I have a C++ code which I am trying to port to DPC++, we suggest you to try using USM (Unified shared memory) model instead of buffer accessor model
As USM provides a familiar pointer-based C++ interface and you can continue to work without modification.
Kernel launch behavior is asynchronous so we need to use q.wait() method to make it synchronous.
In USM we have 3 different allocation models to create memory to input data.
Device: Data is available on device attached memory, but is not directly accessible from the host. We must use explicit copy operations to move
Host: Data located on host side and can be accessed on host directly and device data will be moved using bus.
Shared: Data is available on both host and device and can be migratable.
For more details please refer
Data Parallel C++ book by James Reinders (Page no. 150)
Please find below code snippet for usage of shared allocation method
#include<CL/sycl.hpp>
#include<iostream>
using namespace sycl;
int main()
{
int N=1000;
queue q;
auto A = (int*)malloc_shared(N * sizeof(int),q);
auto B = (int*)malloc_shared(N * sizeof(int),q);
auto C = (int*)malloc_shared(N * sizeof(int),q);
for (int i = 0; i < N; i++) {
A[i] = i; B[i] = 2 * i;
}
q.submit([&](handler& h) {
auto R = range<1>{ N };
h.parallel_for(R, [=](id<1> ID) {
auto i = ID[0];
C[i] = A[i] + B[i];
});
});
q.wait();
std::cout << C[1] << std::endl;
return 0;
}
If you face any issues while trying the USM implementation in your code,
please provide us a sample reproducer so that we can try it from our end
Thanks & Regards,
Noorjahan.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Since you have mentioned >> I have a C++ code which I am trying to port to DPC++, we suggest you to try using USM (Unified shared memory) model instead of buffer accessor model
As USM provides a familiar pointer-based C++ interface and you can continue to work without modification.
Kernel launch behavior is asynchronous so we need to use q.wait() method to make it synchronous.
In USM we have 3 different allocation models to create memory to input data.
Device: Data is available on device attached memory, but is not directly accessible from the host. We must use explicit copy operations to move
Host: Data located on host side and can be accessed on host directly and device data will be moved using bus.
Shared: Data is available on both host and device and can be migratable.
For more details please refer
Data Parallel C++ book by James Reinders (Page no. 150)
Please find below code snippet for usage of shared allocation method
#include<CL/sycl.hpp>
#include<iostream>
using namespace sycl;
int main()
{
int N=1000;
queue q;
auto A = (int*)malloc_shared(N * sizeof(int),q);
auto B = (int*)malloc_shared(N * sizeof(int),q);
auto C = (int*)malloc_shared(N * sizeof(int),q);
for (int i = 0; i < N; i++) {
A[i] = i; B[i] = 2 * i;
}
q.submit([&](handler& h) {
auto R = range<1>{ N };
h.parallel_for(R, [=](id<1> ID) {
auto i = ID[0];
C[i] = A[i] + B[i];
});
});
q.wait();
std::cout << C[1] << std::endl;
return 0;
}
If you face any issues while trying the USM implementation in your code,
please provide us a sample reproducer so that we can try it from our end
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Noorjahan,
Thank you for your quick response.
I never considered USM for my code since I was assuming that USM would not give us a good performance.
Since I have already spent a ton of time on the buffer model, I would like to make sure about the USM model before getting started.
What if I allocate memory on the host and copy it to the device when need be? Based on your experience, is this a better approach or using shared memory from the beginning?
Sorry, I can't provide you with a snippet code for this question since I am only thinking about the design right now.
Thanks,
Leila
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again,
After looking at the chapter you pointed me to, I understand your point now. Please ignore my previous comment.
Thank you for your assistance!
Leila
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>What if I allocate memory on the host and copy it to the device when need be? Based on your experience, is this a better approach or using shared memory from the beginning?
Based on the use case, we can use either of the allocation method. we can use host allocation method but it consumes more time if we have more data movement from host to device and vice-versa. It will be good to use shared memory allocation in such cases.
>>After looking at the chapter you pointed me to, I understand your point now.
Glad to know that you have figured it out.
Thanks for accepting as a solution
As this issue has been resolved, we will no longer respond to this thread.
If you require any additional assistance from Intel, please start a new thread.
Any further interaction in this thread will be considered community only.
Thanks & Regards
Noorjahan

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page