I attempting to implement a streaming or pseudo streaming interface between the host and device using the Intel A10 GX board, targeting the a10gx_hostch board in the a10_ref BSP supplied by Intel. I am using version 17.1.1.
I can find very little info on the host channel implementation. I was able to create a simple loopback kernel that reads from host_to_dev channel and writes back to the dev_to_host channel (the channels are described in the board_spec.xml). I can compile the .aocx without error using the aoc compiler and correctly targeting the a10gx_hostch BSP.
<channels> <interface name="board" port="host_to_dev" type="streamsource" width="256" chan_id="host_to_dev"/> <interface name="board" port="dev_to_host" type="streamsink" width="256" chan_id="dev_to_host"/> </channels>
On the host side, I can find no support, examples, or documentation on reading/writing to these host channels. If I've missed it, I apologize. They are mentioned in Intel AN831. They are also mentioned in "Intel FPGA SDK for OpenCL Intel Arria10 GX FPGA Development Kit Reference Platform Porting Guide" - but only that they exist and how they are connected in the qsys design to a DMA.
I've found these threads that seem to use host channel / host pipe terms interchangeably .
The Intel design example for host pipes found at the link below does not mention host channels, but does say it requires a10ref_hostch.
So a few questions:
1) Are host channels and host pipes the same thing??
2) If not, how do I read/write to host channels from the host? I hope it's not using low level DMA routines?
Any guidance or redirections to documents/white papers/reference designs is appreciated.
The replies in those threads were made by me (in the old Altera forum). I generally use the terms "pipe" and "channel" interchangeably but I am not sure if Altera/Intel agrees with this. 😅
Either way, the example you posted seems to include everything required to use host channels (or pipes 😜). No low-level DMA access is required; you just need to define the host channel as an argument in the kernel and then write data to it in the host code and read it in the kernel code (or vise versa) using the specific notation used in that example. I believe there is also fine-grained stalling mechanism in-place (similar to on-chip channels), that will block the host if the channel is full or block the kernel if it is empty. There is also an example in "Intel FPGA SDK for OpenCL Pro Edition, Section 18.104.22.168. Example Use of the cl_intel_fpga_host_pipe Extension".
Thanks for the response @HRZ . Too bad someone from Intel won't weigh in on host channel vs host pipe convention!!
I assumed they must be the same, and was able to compile and use the example targeting the host_ch BSP on the A10 GX eval card. I was also able to make some modifications to the example and use the one-at-a-time clWritePipeIntelFPGA and clReadPipeIntelFPGA calls. I had not used the clGetExtensionFunctionAddress call or have much experience with function pointers, but seems to work.
Out of curiosity since so few people are using these features - have you tested the throughput you can achieve using the host pipes? I'm considering creating a multi-threaded host application that writes/reads to a device kernel doing some simple data manipulation. I assume any bottleneck would be in the host application or host pipe logic in the BSP - not in the kernel or PCIe interfaces. I'm hoping to move ~1GB/sec
Well, the unfortunate reality is that most people are still stuck in getting basic stuff to work on their boards and won't get to such advanced features any time soon; apart from that, many people (including me) use third-party boards which rarely get BSP updates and hence, in many cases they cannot use new features like this due to lack of compatible BSP. I am personally very interested in trying host channels/pipes but my board manufacturer is yet to release a BSP with this feature included.
With respect to the performance of host pipes/channels, I can point you to Intel's own paper (which is unfortunately behind a paywall):
I am in the same situation with regards to slow BSP updates on a 3rd party board, but the host pipe/channel features is compelling enough that I'm now working with the Intel A10 GX board that has a BSP claiming host channel support. But that board doesn't support built in IO channels to the network interfaces like 3rd party boards - which leads me down the path of creating a custom BSP for the Intel board.
A paper by Intel engineers about about a feature for Intel FPGAs is behind a paywall.... interesting. I can't even find their emails to contact them directly and request a copy.
I see. I think Intel themselves probably have non-public BSPs for the reference boards with support for different components. Though, I am not sure how they could be requested now that they don't accept service requests from public anymore.
Regarding the paper, they published it at a scientific conference, hence it is behind the paywall of the publisher. If you search "Peter Yiannacouras" in google, you can find his Linkedin profile and his personal page on University of Toronto's website with his old email address; if that didn't work, you can try reaching out to him on Linkedin.
P.S. Have you successfully used the network I/O channels provided by your third-party board? From what I know, the "free" network I/O channels you get with such boards only allow communication at physical layer and you will have to perform framing and handle all the control signals yourself (in OpenCL). Furthermore, there is no flow control. There is another network I/O channel that also provides the MAC layer protocol and significantly simplifies usage but that requires Intel's ultra-expensive low-latency MAC license. But still, there is no flow control.
I have successfully used network I/O channels on 3rd party boards that include the 10GigE MAC as part of the BSP - but you're correct that requires the ultra-expensive low-latency MAC license. But I have not tried to use flow control. I have been able to create kernels attached to the I/O channels that process data at the IP and transport layers at line rate.
You should be able to find all the information you need about host pipes at this link.
The document is for version 18.1, but the information should still apply to 17.1.
Current Pipes/Channels Restrictions
*Channels can have multiple read call sites, but only a single write call site
*Pipes can have only a single read and a single write call site
*Loop containing pipes or channel writes cannot be unrolled
*Kernels with channels can’t be vectorized (num_simd_work_items)
*Kernels CU with channels likely cannot be replicated (num_compute_unit)
– Unless max_global_work_dim(0) applied
*Dynamic indexing into arrays of channel IDs not supported
–Only static indexing supported
–AOC needs to know how to connect to HW at compile time
Pipes vs. Channels
*Most cases they are the same
–Usage and Performance
–For host pipes
–Partially conformant to OpenCL™ standards
–Needs modification from OpenCL 2.0 Pipes
–With autorun kernels
–Use model more aligned with FPGA implementation
–Pipe usage more verbose, especially on the host side
Thanks DPrin! I've seen the section regarding host pipes, and the fairly new details about kernel to kernel pipes/channel convention. My initial questions was if host pipes and host channels are the same thing. There is no documentation on "host channels" even though the Intel BSP calls them host channels - which led to my confusion.
I understand your confusion. The underlying implementation is the same. "Channels" was a Intel/Altera extension and the name didn't get changed in the BSP. Pipes are now part of the OpenCL spec so host communication always uses pipes not channels. I'll see if I can get the BSP names changed to pipes.