- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Shown below is the code snippet of my program implemented using openmp parallel_for and using SYCL libraries(offload platform: CPU)
Case 1: Parallelization using openmp (standard cpp codes)
#pragma omp parallel for
for (int i = 0; i < loopCount; i++)
{
for (int m = 0; m < Len; m++)
{
OutputParallel[i] += testData[i + m] * Coeff[m];
}
}
case 2: Using DPC++ libraries; targets CPU
// buffer declaration
sycl::buffer testBuffx64(testData);
sycl::bufferBuffx64(Coeff);
sycl::buffer convBuffx64(Outputx64Oneapi);
t4 = std::chrono::high_resolution_clock::now();
try
{
cl::sycl::queue Q(sycl::cpu_selector{}); // device for computation offload
Q.submit([&](sycl::handler& h)
{
// accessor declaration
sycl::accessor testAccess(testBuffx64, h, sycl::read_only);
sycl::accessor Access(Buffx64, h, sycl::read_only);
sycl::accessor cAccess(convBuffx64, h, sycl::read_write);
h.parallel_for(sycl::range{ loopCount }, [=](sycl::id<1> idx) {
for (int m = 0; m < filterLen; m++)
{
cAccess[idx] += testAccess[idx + m] * Access[m];
}
});
});
Q.wait_and_throw();
}
catch (sycl::exception const& e)
{
std::cout << "Caught a SYCL host exception for x64:\n" << e.what() << "\n";
}
The throughput of case 2 is higher than case 1 although both are implemented on CPU.
Is there a reason why the DPC++ implementation is faster?
Thank you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel Communities.
Could you please provide us with the Environment details and Intel OneAPI version you are using?
>>the DPC++ implementation is faster?
Could you please let us know how are you calculating the time? Are you using any specific tool to calculate the time?
Could you please provide us the results(difference in time) you are getting while running the two cases(openmp code and dpcpp code)?
And also, could you please provide us with the complete sample reproducer code and the compiler you are using to run the openmp code(case 1)?
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Intel oneAPI version 2022
IDE: Microsoft Visual Studio 2019.
I have attached the complete project solution.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please let me know if you need any other information
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are working on your issue internally. We will get back to you soon.
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What's the command line option you used when building the C++/DPC++ code? could you show the "icx or dpcpp --version"?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i do not see DPC++ is faster.
C:\osc05423107>dpcpp -g /EHsc -fiopenmp -fsycl-targets=spir64_x86_64 Filter.cpp
C:\osc05423107>Filter.exe
implementation is correct
Throughput for serial implementation:0.0172098
Throughput for parallel implementation:0.0333564
Throughput for oneAPI x64 implementation:1.9899
Throughput for oneAPI FPGA implementation:inf
success
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
'Throughput for oneAPI x64 implementation:' is the DPC++ implementation which refers to the device code offloaded to x64 platform.
Even in your case you can see that DPC++ implementation is faster than C++ implementation with openmp parallelization(Throughput for parallel implementation).

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page