Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.

How to infer BRAM fan-out

Altera_Forum
Honored Contributor II
1,099 Views

I am doing an OpenCL project of vector multiplication of VecA (M * 1) and VecB (1 * N) which produces a matrix MatC (M * N). I want to use a fan-out design which can support a 2-D processing engine array. Can I go like this to infer fan-our? : 

https://alteraforum.com/forum/attachment.php?attachmentid=14222&stc=1  

 

__kernel 

void matMult() { 

 

...... 

# pragma unroll 

for(int x = 0; x < M; x++) {# pragma unroll 

for(int y = 0; y < N; y++) { 

MatC[x][y] += VecA[x] * VecB[y]; 

 

...... 

 

 

 

Any advice would be much appreciated!!
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
121 Views

You can achieve this type of systolic array design using the autorun kernel type and num_compute_units (Section 2.3 and 2.4 of Intel FPGA SDK for OpenCL Programming Guide). However, I would expect the same thing to be also achievable in a single kernel using loop unrolling, where the local memory buffers are automatically replicated by the compiler.

Altera_Forum
Honored Contributor II
121 Views

Hi, thanks for your reply! Do you know any OpenCL systolic array design examples? (with code)

Altera_Forum
Honored Contributor II
121 Views

There are some small code snippets in Altera's documents in the sections I mentioned above, but other than that, I do not know of any other public code showing the systolic array design.

Altera_Forum
Honored Contributor II
121 Views

Hi, I tried the systolic array and it takes massive amount of BRAM and registesr (mostly for control overhead) which causes my design to be severely memory-bounded. But if I do the fan-out design, the way I unroll the loop cannot work out, it produces wrong output in hardware run. Do you have any idea how the loops should be unrolled?

Altera_Forum
Honored Contributor II
121 Views

 

--- Quote Start ---  

There are some small code snippets in Altera's documents in the sections I mentioned above, but other than that, I do not know of any other public code showing the systolic array design. 

--- Quote End ---  

 

 

Intel's FPGA systolic array example is a controlled material(using public code may not able to get best performance as not optimized for FPGA), and in the event user wish to have a copy that need to contact Altera representative separately. 

 

Regards, 

CloseCL 

(This message was posted on behalf of Intel Corporation)
Altera_Forum
Honored Contributor II
121 Views

Hi, Sir/madam, 

 

may I ask who should I contact if I would like to request for a copy? 

 

Regards, 

Lancer Chiang
Altera_Forum
Honored Contributor II
121 Views

Hi Lancer, 

 

You can contact our sales/FAE as NDA is required. 

 

Thanks, 

 

Regards, 

CloseCL 

(This message was posted on behalf of Intel Corporation)
Altera_Forum
Honored Contributor II
121 Views

Hi Sir/Madam, 

 

Many thanks! Is the copy an OpenCL implementation? 

 

Regards, 

Lancer Chiang
Altera_Forum
Honored Contributor II
121 Views

Do you know any OpenCL systolic array design examples? (with code)

Reply