Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17267 Discussions

How to infer BRAM fan-out

Altera_Forum
Honored Contributor II
2,520 Views

I am doing an OpenCL project of vector multiplication of VecA (M * 1) and VecB (1 * N) which produces a matrix MatC (M * N). I want to use a fan-out design which can support a 2-D processing engine array. Can I go like this to infer fan-our? : 

https://alteraforum.com/forum/attachment.php?attachmentid=14222&stc=1  

 

__kernel 

void matMult() { 

 

...... 

# pragma unroll 

for(int x = 0; x < M; x++) {# pragma unroll 

for(int y = 0; y < N; y++) { 

MatC[x][y] += VecA[x] * VecB[y]; 

 

...... 

 

 

 

Any advice would be much appreciated!!
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
1,542 Views

You can achieve this type of systolic array design using the autorun kernel type and num_compute_units (Section 2.3 and 2.4 of Intel FPGA SDK for OpenCL Programming Guide). However, I would expect the same thing to be also achievable in a single kernel using loop unrolling, where the local memory buffers are automatically replicated by the compiler.

0 Kudos
Altera_Forum
Honored Contributor II
1,542 Views

Hi, thanks for your reply! Do you know any OpenCL systolic array design examples? (with code)

0 Kudos
Altera_Forum
Honored Contributor II
1,542 Views

There are some small code snippets in Altera's documents in the sections I mentioned above, but other than that, I do not know of any other public code showing the systolic array design.

0 Kudos
Altera_Forum
Honored Contributor II
1,542 Views

Hi, I tried the systolic array and it takes massive amount of BRAM and registesr (mostly for control overhead) which causes my design to be severely memory-bounded. But if I do the fan-out design, the way I unroll the loop cannot work out, it produces wrong output in hardware run. Do you have any idea how the loops should be unrolled?

0 Kudos
Altera_Forum
Honored Contributor II
1,542 Views

 

--- Quote Start ---  

There are some small code snippets in Altera's documents in the sections I mentioned above, but other than that, I do not know of any other public code showing the systolic array design. 

--- Quote End ---  

 

 

Intel's FPGA systolic array example is a controlled material(using public code may not able to get best performance as not optimized for FPGA), and in the event user wish to have a copy that need to contact Altera representative separately. 

 

Regards, 

CloseCL 

(This message was posted on behalf of Intel Corporation)
0 Kudos
Altera_Forum
Honored Contributor II
1,542 Views

Hi, Sir/madam, 

 

may I ask who should I contact if I would like to request for a copy? 

 

Regards, 

Lancer Chiang
0 Kudos
Altera_Forum
Honored Contributor II
1,542 Views

Hi Lancer, 

 

You can contact our sales/FAE as NDA is required. 

 

Thanks, 

 

Regards, 

CloseCL 

(This message was posted on behalf of Intel Corporation)
0 Kudos
Altera_Forum
Honored Contributor II
1,542 Views

Hi Sir/Madam, 

 

Many thanks! Is the copy an OpenCL implementation? 

 

Regards, 

Lancer Chiang
0 Kudos
Altera_Forum
Honored Contributor II
1,542 Views

Do you know any OpenCL systolic array design examples? (with code)

0 Kudos
Reply