Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16697 Discussions

Multiple kernels and logic utilisation.

Altera_Forum
Honored Contributor II
1,252 Views

Hi all, 

 

I am currently working on port a C library to OpenCL with the target platform being the Altera OpenCL (using the Bittware S5HQ PCIe). Essentially, I would like to have a library of kernels which a user can load at runtime, however I am not sure how to fit this "library" approach with Altera's "all-kernels-in-the-on-cl-file" requirement. 

 

If I put all my kernels in the one CL file then it can't compile because combined all the functions will exceed logic utilisation (this is not the use case anyway), I would Ideally have each kernel in a separate *.cl file and let the user pick-and choose but then it seems annoying (for the user) to need to combine the set of functions that then want into the on *.cl for compilation. Also not all functions are kernel function, but other helper routines which might make it even more awkward for a user. 

 

Does anyone know if there are plans to have a multi-file compile option for aoc?
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
456 Views

I'm double checking to see if this is still a restriction and will post what I find here.

0 Kudos
Altera_Forum
Honored Contributor II
456 Views

The restriction is that one .aocx file can be paired up to a cl_program. So if you want to have multiple kernel files that get swapped in and out just make sure you have multiple cl_program objects in your host code. You will still need to compile each .cl file individually using aoc.exe since there is no way to pass in multiple kernel files (you often want to compile them with different flags anyway). 

 

One word of caution is that each time the OpenCL runtime needs to swap out hardware it will copy any live buffers in the FPGA up to the host and restore them after the kernel hardware (.aocx file) has been configured into the FPGA. So if you host leaves a bunch of unused buffers in the FPGA instead of freeing them then you'll be copying data back and forth when switching between cl_program objects. Also there is an overhead for configuring the hardware as well so when you are determine which kernels go into each .cl file, think ahead about this overhead and how your host will be running the kernels and try to group kernels into the same .cl file to help minimize the overhead whenever possible.
0 Kudos
Altera_Forum
Honored Contributor II
456 Views

 

--- Quote Start ---  

The restriction is that one .aocx file can be paired up to a cl_program. So if you want to have multiple kernel files that get swapped in and out just make sure you have multiple cl_program objects in your host code. You will still need to compile each .cl file individually using aoc.exe since there is no way to pass in multiple kernel files (you often want to compile them with different flags anyway).  

--- Quote End ---  

 

 

This is what I had thought was the case, just checking for any update. I was hoping to avoid multiple copies of the helper device routines. Say I have a kerel foo.cl and it uses a function in bar.cl (but bar.cl is common to many other kernels), I would like to be able to go 

 

aoc -c foo.cl bar.cl [options] 

aoc foo.aoco bar.aoco 

 

This way I could supply the compiled modules with various compile options a let the user decied what cl_programs to create the the specific use case.  

 

 

 

 

--- Quote Start ---  

One word of caution is that each time the OpenCL runtime needs to swap out hardware it will copy any live buffers in the FPGA up to the host and restore them after the kernel hardware (.aocx file) has been configured into the FPGA. So if you host leaves a bunch of unused buffers in the FPGA instead of freeing them then you'll be copying data back and forth when switching between cl_program objects. Also there is an overhead for configuring the hardware as well so when you are determine which kernels go into each .cl file, think ahead about this overhead and how your host will be running the kernels and try to group kernels into the same .cl file to help minimize the overhead whenever possible. 

--- Quote End ---  

 

 

Yes, this is duely noted, but this is something I cannot design the "best" option as it is user application specific... Basically I am writing device code and leaving most (not all there are some frequent usages) host code to the user. 

 

I should say more about the use case, I am part of the adimistration team for an HPC system at QUT (Australia). We naturally like the thought of low power accelerators eg. FPGAs (but we also play with GPUs and intel Xeon PHIs and of coares lots of CPUS), we've been looking for ways to make FPGAs accessible to reseachers, obviously HDL is not going to cut it with the masses. We've played with Mitrion-c, impulse-c, system-C, DIME-C, and more recently Xilinx HLS, but all of these are still far to "hardware" centric for most researchers who simply want to run simulations faster (with alower Power footprint), but providing a set of commonlg used functions I am hoping to help with tha large take up of Recpfigurable HPC at our University... So you see why predicting the best option of for a cl_program is difficult to predict in general (fine in some common cases though).
0 Kudos
Altera_Forum
Honored Contributor II
456 Views

To reuse helper functions I recommend placing the those functions into a seperate file and include the file in the kernel files that require it. Only the functions used by a kernel will get pulled into the hardware so you don't have to worry about bloating up the hardware with extra helper functions that are not used.

0 Kudos
Altera_Forum
Honored Contributor II
456 Views

 

--- Quote Start ---  

To reuse helper functions I recommend placing the those functions into a seperate file and include the file in the kernel files that require it. Only the functions used by a kernel will get pulled into the hardware so you don't have to worry about bloating up the hardware with extra helper functions that are not used. 

--- Quote End ---  

 

 

Thanks for the recommendation, thats good to know that only used functions get pulled in. So that is a reasonable solution (although I do have somewhat of a philosophical issue with puting function implementations in a header file, or including a .cl file but I guess it works so that will work fine) 

 

Thanks,
0 Kudos
Reply