- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I was wondering , does OneAPI support GPU and FPGA programming in the same code? I found the tutorial didn't give out a very specific idea.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel forums.
Intel oneapi has samples that support cpu,gpu and fpga.You can find those samples in oneapi-cli in devcloud.You can find the links to similar samples below.Try these samples to know more about it.
Hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
sorry to bother you again, I don't think this is what I am looking for, I was wondering whether I can write up a DPC++ file which support GPU and FPGA work in parallel. I was wondering whether this can be achieved in one single file?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Yes,this is possible by launching 2 kernels with different device selector. One kernel with gpu_selector and other with intel::fpga_selector. As the kernel call is asynchronous they both will work parallelly.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you give me some examples?
I didn't find any example which implements GPU and FPGA in two kernel in the same DPC++ file, also, the compile make file is also different I guess, how do we compile that?
Thanks
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chao,
I am assuming that you want to implement your code parallelly on iGPU and fpga_emulator.
Please find the below code sample which will run a simple vector add on iGPU and fpga_emulator.
#include <iostream>
#include <CL/sycl.hpp>
#include <CL/sycl/intel/fpga_extensions.hpp>
#define N 10
int main(int, char**) {
float *d1_a=(float *)malloc(N*sizeof(float));
float *d1_b=(float *)malloc(N*sizeof(float));
float *d1_c=(float *)malloc(N*sizeof(float));
float *d2_a=(float *)malloc(N*sizeof(float));
float *d2_b=(float *)malloc(N*sizeof(float));
float *d2_c=(float *)malloc(N*sizeof(float));
for(long int i=0;i<N;i++){
d1_a[i]=i;
d1_b[i]=N-i;
d2_a[i]=i;
d2_b[i]=N-i;
}
auto exception_handler = [] (cl::sycl::exception_list exceptions) {
for (std::exception_ptr const& e : exceptions) {
try {
std::rethrow_exception(e);
} catch(cl::sycl::exception const& e) {
std::cout << "Caught asynchronous SYCL exception:\n"<< e.what() << std::endl;
}
}
};
cl::sycl::queue queue_d1(cl::sycl::gpu_selector{}, exception_handler);
cl::sycl::queue queue_d2(cl::sycl::intel::fpga_emulator_selector{}, exception_handler);
/*std::cout << "Running on "
<< queue_d2.get_device().get_info<cl::sycl::info::device::name>()
<< "\n";
*/
{
cl::sycl::buffer<float, 1> d1_a_sycl{d1_a, cl::sycl::range<1>{N} };
cl::sycl::buffer<float, 1> d1_b_sycl{d1_b, cl::sycl::range<1>{N} };
cl::sycl::buffer<float, 1> d1_c_sycl{d1_c, cl::sycl::range<1>{N} };
cl::sycl::buffer<float, 1> d2_a_sycl{d2_a, cl::sycl::range<1>{N} };
cl::sycl::buffer<float, 1> d2_b_sycl{d2_b, cl::sycl::range<1>{N} };
cl::sycl::buffer<float, 1> d2_c_sycl{d2_c, cl::sycl::range<1>{N} };
queue_d1.submit([&] (cl::sycl::handler& cgh) {
auto a_acc = d1_a_sycl.get_access<cl::sycl::access::mode::read>(cgh);
auto b_acc = d1_b_sycl.get_access<cl::sycl::access::mode::read>(cgh);
auto c_acc = d1_c_sycl.get_access<cl::sycl::access::mode::discard_write>(cgh);
cgh.parallel_for<class vector_addition_d1>(cl::sycl::range<1>{ N }, [=](cl::sycl::id<1> idx) {
c_acc[idx] = a_acc[idx] + b_acc[idx];
});
});
queue_d2.submit([&] (cl::sycl::handler& cgh) {
auto a_acc = d2_a_sycl.get_access<cl::sycl::access::mode::read>(cgh);
auto b_acc = d2_b_sycl.get_access<cl::sycl::access::mode::read>(cgh);
auto c_acc = d2_c_sycl.get_access<cl::sycl::access::mode::discard_write>(cgh);
cgh.parallel_for<class vector_addition_d2>(cl::sycl::range<1>{ N }, [=](cl::sycl::id<1> idx) {
c_acc[idx] = a_acc[idx] + b_acc[idx];
});
});
}
try {
queue_d1.wait_and_throw();
queue_d2.wait_and_throw();
}catch (cl::sycl::exception const& e) {
std::cout << "Caught synchronous SYCL exception:\n"<< e.what() << std::endl;
}
for(int i=0;i<N;i++)
std::cout<<d1_c[i]<<" ";
std::cout<<std::endl;
for(long int i=0;i<N;i++)
std::cout<<d2_c[i]<<" ";
std::cout<<std::endl;
return 0;
}
You can change the device selector according to your use-case.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I already did this before, I am actually looking for a solution for using GPU and FPGA instead of just emulator.
the problem I am now facing is I can't compile the file, you see that GPU and FPGA need two different compile instruction in makefile.
I didn't find any workflow that intel example, I suppose you can use one file to make CPU, FPGA and GPU works in parallel?
For example, I want to partition one matrix to three part and do three partial multiple in CPU, GPU, FPGA within one node, is this possible?
Thanks
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chao,
We are moving this thread to the FPGA forum (https://community.intel.com/t5/Intel-High-Level-Design/bd-p/high-level-design) where the experts in FPGA will guide you in your query.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
Please find the method discussed as below for targeting GPU and FPGA at the same time.
Thanks and Regards
Anil

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page