Intel® High Level Design
Support for Intel® High Level Synthesis Compiler, DSP Builder, OneAPI for Intel® FPGAs, Intel® FPGA SDK for OpenCL™
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
440 Discussions

Support for GPU and FPGA programming

Gao__Chao
Beginner
631 Views

Hi, I was wondering , does OneAPI support GPU and FPGA programming in the same code? I found the tutorial didn't give out a very specific idea.

0 Kudos
8 Replies
JananiC_Intel
Moderator
623 Views

Hi,


Thanks for posting in Intel forums.


Intel oneapi has samples that support cpu,gpu and fpga.You can find those samples in oneapi-cli in devcloud.You can find the links to similar samples below.Try these samples to know more about it.

https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/DPC%2B%2B/DenseLinearAlge...

https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/DPC%2B%2B/DenseLinearAlge...


Hope this helps!


Gao__Chao
Beginner
613 Views

Hi, 

sorry to bother you again, I don't think this is what I am looking for, I was wondering whether I can write up a DPC++ file which support GPU and FPGA work in parallel. I was wondering whether this can be achieved in one single file?

Thanks

JananiC_Intel
Moderator
597 Views

Hi,


Yes,this is possible by launching 2 kernels with different device selector. One kernel with gpu_selector and other with intel::fpga_selector. As the kernel call is asynchronous they both will work parallelly.


Thanks.


Gao__Chao
Beginner
590 Views

Hi,

Could you give me some examples?

I didn't find any example which implements GPU and FPGA in two kernel in the same DPC++ file, also, the compile make file is also different I guess, how do we compile that?

Thanks

Chao

AbhishekD_Intel
Moderator
580 Views

Hi Chao,

 

I am assuming that you want to implement your code parallelly on iGPU and fpga_emulator.

Please find the below code sample which will run a simple vector add on iGPU and fpga_emulator.

 

#include <iostream>
#include <CL/sycl.hpp>
#include <CL/sycl/intel/fpga_extensions.hpp>
#define N 10

int main(int, char**) {

        float *d1_a=(float *)malloc(N*sizeof(float));
        float *d1_b=(float *)malloc(N*sizeof(float));
        float *d1_c=(float *)malloc(N*sizeof(float));

        float *d2_a=(float *)malloc(N*sizeof(float));
        float *d2_b=(float *)malloc(N*sizeof(float));
        float *d2_c=(float *)malloc(N*sizeof(float));

        for(long int i=0;i<N;i++){
                d1_a[i]=i;
                d1_b[i]=N-i;
                d2_a[i]=i;
                d2_b[i]=N-i;
        }


        auto exception_handler = [] (cl::sycl::exception_list exceptions) {
            for (std::exception_ptr const& e : exceptions) {
                try {
                        std::rethrow_exception(e);
                } catch(cl::sycl::exception const& e) {
                std::cout << "Caught asynchronous SYCL exception:\n"<< e.what() << std::endl;
                }
            }
        };

        cl::sycl::queue queue_d1(cl::sycl::gpu_selector{}, exception_handler);
        cl::sycl::queue queue_d2(cl::sycl::intel::fpga_emulator_selector{}, exception_handler);
        /*std::cout << "Running on "
                << queue_d2.get_device().get_info<cl::sycl::info::device::name>()
                << "\n";
                */

        {
                cl::sycl::buffer<float, 1> d1_a_sycl{d1_a, cl::sycl::range<1>{N} };
                cl::sycl::buffer<float, 1> d1_b_sycl{d1_b, cl::sycl::range<1>{N} };
                cl::sycl::buffer<float, 1> d1_c_sycl{d1_c, cl::sycl::range<1>{N} };

                cl::sycl::buffer<float, 1> d2_a_sycl{d2_a, cl::sycl::range<1>{N} };
                cl::sycl::buffer<float, 1> d2_b_sycl{d2_b, cl::sycl::range<1>{N} };
                cl::sycl::buffer<float, 1> d2_c_sycl{d2_c, cl::sycl::range<1>{N} };

                queue_d1.submit([&] (cl::sycl::handler& cgh) {
                                auto a_acc = d1_a_sycl.get_access<cl::sycl::access::mode::read>(cgh);
                                auto b_acc = d1_b_sycl.get_access<cl::sycl::access::mode::read>(cgh);
                                auto c_acc = d1_c_sycl.get_access<cl::sycl::access::mode::discard_write>(cgh);

                                cgh.parallel_for<class vector_addition_d1>(cl::sycl::range<1>{ N }, [=](cl::sycl::id<1> idx) {
                                                c_acc[idx] = a_acc[idx] + b_acc[idx];

                                });
                });

                queue_d2.submit([&] (cl::sycl::handler& cgh) {
                                auto a_acc = d2_a_sycl.get_access<cl::sycl::access::mode::read>(cgh);
                                auto b_acc = d2_b_sycl.get_access<cl::sycl::access::mode::read>(cgh);
                                auto c_acc = d2_c_sycl.get_access<cl::sycl::access::mode::discard_write>(cgh);

                                cgh.parallel_for<class vector_addition_d2>(cl::sycl::range<1>{ N }, [=](cl::sycl::id<1> idx) {
                                                c_acc[idx] = a_acc[idx] + b_acc[idx];

                                });
                });

        }

        try {
                queue_d1.wait_and_throw();
                queue_d2.wait_and_throw();
        }catch (cl::sycl::exception const& e) {
                std::cout << "Caught synchronous SYCL exception:\n"<< e.what() << std::endl;
        }

        for(int i=0;i<N;i++)
                std::cout<<d1_c[i]<<" ";

        std::cout<<std::endl;

        for(long int i=0;i<N;i++)
                std::cout<<d2_c[i]<<" ";

        std::cout<<std::endl;

        return 0;
}

 

You can change the device selector according to your use-case.

 

 

Warm Regards,

Abhishek

 

Gao__Chao
Beginner
554 Views

Hi,

I already did this before, I am actually looking for a solution for using GPU and FPGA instead of just emulator.

the problem I am now facing is I can't compile the file, you see that GPU and FPGA need two different compile instruction in makefile.

I didn't find any workflow that intel example, I suppose you can use one file to make CPU, FPGA and GPU works in parallel?

For example, I want to partition one matrix to three part and do three partial multiple in CPU, GPU, FPGA within one node, is this possible? 

Thanks

Chao

AbhishekD_Intel
Moderator
470 Views

Hi Chao,


We are moving this thread to the FPGA forum (https://community.intel.com/t5/Intel-High-Level-Design/bd-p/high-level-design) where the experts in FPGA will guide you in your query.



Warm Regards,

Abhishek


AnilErinch_A_Intel
220 Views

Hi ,

Please find the method discussed as below for targeting GPU and FPGA at the same time.


https://software.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/soft...


Thanks and Regards

Anil


Reply