Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16556 Discussions

Multi kernel function local memory resource

Altera_Forum
Honored Contributor II
974 Views

I would like to implement CNN using OpenCL and run on FPGA. 

The CNN implementation will contain 4 kernel function 

layer1_kernel, supposed allocate 1 MB local memory 

layer2_kernel, supposed allocate 1 MB local memory 

layer3_kernel, supposed allocate 1 MB local memory 

layer4_kernel, supposed allocate 1 MB local memory 

 

 

Host program will call clEnqueueNDRangeKernel() function in sequence. 

that is, pervious kernel need finished, then the next kernel will be enqueued. 

 

 

What is the total RAM block resource to be used? 

4MB or 1MB? 

 

 

 

 

 

 

Thanks, 

Matt
0 Kudos
4 Replies
Altera_Forum
Honored Contributor II
228 Views

If you put all the kernels in the same cl file, then all of them will be implemented as one FPGA image/bitstream and memory size will be 4 MB. However, if you put them into separate .cl files and compile them separately and load them one by one on the FPGA, the memory usage will be 1 MB, but the FPGA will need to be reconfigured each time you call a new kernel.

0 Kudos
Altera_Forum
Honored Contributor II
228 Views

Thanks for your reply. 

 

the FPGA will need to be reconfigured each time you call a new kernel. 

==>Do you mean that I need to call the following function again? 

getBoardBinaryFile() 

clCreateCommandQueue() 

clCreateKernel() 

clEnqueueNDRangeKernel() 

 

Thanks, 

Matt
0 Kudos
Altera_Forum
Honored Contributor II
228 Views

From my experiments it is the "clCreateProgramWithBinary()" function that reconfigures the FPGA. You will have to call that function and every other function that comes after that (clBuildProgram(), clCreateKernel(), clEnqueueNDRangeKernel(), etc.) every time you want to switch to another kernel that resides in another FPGA image. You do not need to create new queues, though; you can reuse the same queue. 

 

Please note that since FPGA reconfiguration can take up to a few seconds, unless your kernels have a long run time and you rarely switch between them, the overhead of FPGA reconfiguration could actually be higher than kernel run time and you might as well put all your kernels in the same FPGA image so that you do not need to reconfigure the FPGA between kernel runs.
0 Kudos
Altera_Forum
Honored Contributor II
228 Views

From my experience, you can build all the programs and create all the kernels up front as an initialization stage to get a handle to all the kernels, calling the buildProgram and createKernel once for each binary, and then just enqueue the kernels as you need them. 

 

When you do, the first binary that gets loaded will be the first one programmed onto the board during the clCreateProgramWithBinary() call. Then you can enqueue the kernels from separate binaries as long as you have the kernel handles to them. If you enqueue kernel 1 from binary 1, it will simply run since the binary is already preloaded from before. Then if you enqueue kernel 2 from binary 2, it will automatically reprogram the board (if it is created with the same FPGA) in order to run kernel 2. It only reprograms the board as necessary. Hopefully that could save some extra overhead reducing the number of calls on the host. As noted before, switching from binary to binary introduces a significant amount of overhead.
0 Kudos
Reply