Mixing OpenCL kernel with VHDL Netlists

Altera_Forum · ‎08-05-2014

Hi,

Is it possible for the output of an OpenCL kernel to reside in the same FPGA as a VHDL netlist? For example, could I implement an algorithm in OpenCL and a display driver in VHDL? The intention here is to display the output from a parallel algorithm, locally, using a VGA driver implemented on the FPGA fabric not used by the OpenCL kernel.

Altera_Forum · ‎08-06-2014

I don't think it is possible : the aoc compiler translates your OpenCL kernel to a VHDL/Verilog project, and generates all the files that are needed to build the final bitstream. I doubt editing the VHDL/Verilog by hand is doable !

Altera_Forum · ‎08-07-2014

You could add it to the board support package (the hardware your kernel gets added to automatically). You would have to modify the board files so that the kernel compiler knows your extra hardware is present but if your VGA block exposes an Avalon-ST sink port then you can have a kernel write the data directly into it using channels. I would become very familiar with board support packages and channels before attempting this though.

So the idea is not that you edit the system that the OpenCL compiler generates, but rather you edit the system the compiler plugs the kernel into. If you send me a private message with your email address I can send you a 13.0 or 13.1 (I forget which version) Nallatech BSP I hacked up to add a counter that streams an incrementing count value that my kernel just reads and spits out to memory that I verify on the host side. You would be just moving data in the opposite direction.

Altera_Forum · ‎08-07-2014

Hi BadOmen,

I have tried to send you a private message with my email address, but I have not posted the required minimum of 10 messages to be able to do so, I think. If you have not received the private message please let me know.

I guess what you are saying here is that a variant of the Board Support Package, for a particular board, needs to be created with custom modifications in order to use non-OpenCL code on the same FPGA fabric. A device driver should then be written, in say C when using LINUX, to address the custom logic that is not related to the OpenCL kernel..

Altera_Forum · ‎08-07-2014

Looks like you just hit the minimum of 10 posts.

You are correct in your assessment. Also my previous post assumed you wanted the OpenCL kernel to have access to the custom RTL that you add to the board support package. If they are completely indepenent then you can ignore what I said about connecting them via channels.

If your FPGA is connected over PCIe for example then the driver would need to be capable of being accessed not only by the OpenCL runtime but also whatever else is running on the host. What you can sometimes do is use the OpenCL framework to move the data back and forth to the FPGA and just have a simple kernel that moves data between streaming and memory mapped domains. For example I could see updating a video frame being as simple as having a dedicated frame buffer sitting in the FPGA, and having a kernel that has the sole purpose of populating that frame buffer and telling your VGA controller where to find it. That way you don't have to modify the driver because you are relying on the OpenCL framework to handle all communication across the link.

Altera_Forum · ‎08-08-2014

Yes, I see the point of using channels and the simply kernel as you have described it, as it means it would limit the movement of data between the FPGA and the host. I will explore both avenues. Also, I have sent you a private message.

Cheers

Altera_Forum · ‎07-27-2015

Hello BadOmen,

I can't send private message, but could you please send the customized Nallatech BSP for my reference? my email : elfarabi.razali2008@gmail.com

thanks in advance.

regards,

Farabi

Altera_Forum · ‎07-27-2015

My email bounced back so I'll just describe what I did. First I made a copy of the BSP for the board and pointed my environment to use it instead of the one that came for the board. I then opened the Qsys design and added my custom IP to it. Then I edited the board .xml file to add my custom IP to it by describing it's interface (Avalon-ST source). In my kernel I referenced the channel using the same name as what I defined the name to be in the .xml file.

This stuff should be documented fairly well in the OpenCL documentation so I would take a look at it. To test it I recommend generating to the intermediate file (.aoco) then in the generated files open the Qsys system to see if your kernel got stitched up to the custom hardware correctly.

Altera_Forum · ‎03-17-2016

Hi, BadOmen, have you made any progress in this subject recently? I have a custom module that I'd like to try to integrate with an OpenCL kernel ir order to let it manage the datapaths, rather than caring for it myself. I'm using an Arrow SoCKit board with Quartus 15.0.

I'd specially like to know what exact xml file is needed to be modified and where I can find documentation about BSPs. In the future, we're planning to design a custom board with a Cyclone/Arria SoC, then we'll have to create our own BSP.

Thanks in advance.

Altera_Forum · ‎03-17-2016

Caught me right before my long vacation :) Actually I haven't worked on OpenCL in quite a while so I'm not sure what the official flow is anymore. I suspect 15.0 is fine but you might find that 15.1 has improved support. Going by memory there is an .xml file that goes with the BSP that describes the various MM and ST connections within the BSP and so it would be a matter of adding your streaming IP into the Qsys system and including it in the BSP. I suspect that this stuff should be documented by now so I would start on this page: https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html

If you are targeting an SoC then I recommend using the SoC "platform" as the starting point. Luckily SoC BSPs are much easier to build since you typically do not have PCIe and DMAs and instead just provide the HPS SDRAM slave port to the BSP so that kernels can connect up to it. Chances are you can take the SoC platform and just pin it out for your board and it should be fairly seamless. Then I would add the streaming IP to the BSP once you are satisfied that the port was successful.

Altera_Forum · ‎03-17-2016

Wow, lucky me then! Enjoy your vacation ;)

Thanks, I'll dig on the references in the link you provided and see what'll come. The SoCKit BSP I'm using is the one on RocketBoards, and is targeted to 14.0 version, but works fine with 15. I'll also try to find if there is a newer version, as it is stated that the 1GB memory on the FPGA side was not available in the BSP, but it would be added on future releases.

Altera_Forum · ‎09-22-2016

Hi,

I have a related question. Is it possible to have a channel communicate with OpenCL directly. What i am looking at is to have a channel defined in the BSP which i can write to from a C/C++ host application and have the OpenCL kernel read from the channel as an I/O channel. It seems related to what popoolab wanted to do, but the difference is that i want the host-program to talk via a channel instead of a Verilog component.

Any pointers would be appreciated.

Thanks!

Altera_Forum · ‎09-22-2016

If you maintained the communication between the channel and the host outside of the OpenCL infrastructure that should be possible. As long as the channel is exposed in the BSP as a Avalon-ST interface then the kernel should be able to communicate with it but that means it's up to you to take care of the other side of the channel yourself. This could get very messy though if that side band information needs to stay synchronized with kernel invocations so before going too far down that road I would try to determine if you really need to do this.

Altera_Forum · ‎09-29-2016

Thanks BadOmen. The problem that i am trying to solve is to have some high-bandwidth/low-latency communication between the CPU (C++) and the FPGA (OpenCL) on a SoC. What would you recommend as the best approach to the solution. Currently, i copy the data in a circular buffer as well as the endpoints (start/end) to SDRAM, and have the CPU poll over the endpoints to check for data produced by the FPGA. This, as expected is quite slow. I have two approaches in mind to speed it up -

1) Relying on physical addresses mapped in the FPGA-2-HPS and HPS-2-FPGA bridges. I am not sure if there is a region of the physical address space that is mapped to a buffer that i can use for data exchange. As far as i have read, there are specific addresses for IO devices, but nothing for data exchange.

2) Relying on the coherency mechanism (ACP) to communicate the endpoints and use the SDRam for communicating the actual data.

As you might have guessed, i don't have much experience in FPGAs, so any help will be appreciated. I am looking for something that works well with relatively medium effort.

Thanks,

Altera_Forum · ‎09-30-2016

If I understood correctly what you need to do it hand off data from the accelerator to the HPS while the accelerator is still operating? Before I end up sending in the wrong direction I'll need to know more about the data flow of this system. For example if you wanted to send small amounts of information to the HPS while the kernel is still operating you could put a FIFO in your design that the kernel accesses as a channel that the HPS could pop the information out of. Typically circular buffers are how OpenCL developers manage this sort of thing because it allows you to overlap operations if you use the appropriate runtime calls to keep everything synchronized. By the sounds of it you are looking for a way to funnel information back to the HPS outside of the runtime managed portion which can be tricky if you allow the runtime to queue up operations. For example lets say you queue up the kernel to run twice in a row on different buffers but you have a backdoor way for the kernels to communicate with the HPS, now you need some sort of way to ensure the data sent through the alternative means has some sort of context so that you can tell the data apart.