Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, DLA, Software Stack, and Reference Designs
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
422 Discussions

Host-to-Intel arria streaming over PCIe

SBioo
Beginner
1,791 Views

Hi,

 

I've just learned that the intel new BSPs support streaming data from host to FPGAs and vice-versa over the PCIe. I'm still wondering what are the use cases or specific applications that can benefit from such feature! Is there any specific scenario, either in Deep Learning, Reinforcement Learning, Big-Data, or Streaming Processing that can take advantage of such technology?

 

Thanks

0 Kudos
8 Replies
SengKok_L_Intel
Moderator
941 Views
Hi Saman, Could you please provide more information? What new BSP that you are referring to? Is this related to the programmable acceleration card? Regards -SK
SBioo
Beginner
941 Views

for the A10_ref, there is a BSP with suffix "hostch". In this mode we have a direct channel from the host to the device, which bypasses the FPGA main memory.

SengKok_L_Intel
Moderator
941 Views

I found the paper below may be helpful for you to further understanding the streaming interface:

 

http://delivery.acm.org/10.1145/3080000/3078182/a25-kang.pdf?ip=192.198.147.165&id=3078182&acc=ACTIV...

 

Regards -SK Lim

SBioo
Beginner
941 Views

Hi SK Lim,

 

Thanks much for sharing the paper with me. Unfortunately the link does not work. Could you please share the title of the paper with me?

 

Thanks,

Saman

SengKok_L_Intel
Moderator
941 Views

​Hi Saman,

 

Here is the title: Host Pipes: Direct Streaming Interface Between OpenCL Host

and Kernel

 

Regards -SK

HRZ
Valued Contributor II
941 Views

To answer the original question, this feature is very useful for "out-of-core processing"; i.e. processing data that is too big to fit on the FPGA external memory but can fit on the host memory. There is a large body of work in HPC and Big Data using GPUs where overlapping/pipelining of compute and PCI-E transfer is implemented using double buffering on the GPU memory. For applications that can be "streamed", host channels on FPGAs can be used to efficiently implement out-of-core processing without the need for double-buffering. However, for applications that cannot be streamed, this feature is not applicable and double buffering will have to be used as is done on GPUs.

MAstr
Novice
941 Views

Could you provide a link to an example of this double buffering mechanism?

 

HRZ
Valued Contributor II
941 Views

I do not know of any such example that you can directly use right now. You might be able to find something if you search in google, especially if you look for CUDA code used for out-of-core processing.

Reply