Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16597 Discussions

direct DDR communication via VHDL/Verilog

Altera_Forum
Honored Contributor II
1,567 Views

Hello all, 

 

I would like to setup a kernel with an RTL library that directly access the DDR. This kind of work is illustrated here, in example 2: https://www.altera.com/support/support-resources/design-examples/design-software/opencl/library-design-example.html 

Unfortunately, this example does not compile with the 17.1 suite (see this other thread (https://www.alteraforum.com/forum/showthread.php?t=57777)), but this is not the point here. I want to reproduce the engineering work done here but I am currently unable to do so. Here is my problem: 

I can see this interface in the XML code (for the CopyElement function here): 

<MEM_INPUT port="m_input_dst" access="readwrite"/> <MEM_INPUT port="m_input_src" access="readonly"/> <INPUT port="m_input_global_id_0" width="32"/> <INPUT port="m_input_local_id_3" width="32"/> <INPUT port="m_input_global_size_0" width="32"/> <INPUT port="m_workgroup_size" width="32"/> <OUTPUT port="m_output_0" width="32"/> <AVALON_MEM port="avm_local_bb1_ld_" width="512" burstwidth="5" optype="read" buffer_location="" /> <AVALON_MEM port="avm_local_bb1_st_" width="512" burstwidth="5" optype="write" buffer_location="" />  

I understand that MEM_INPUT is for giving the pointer of an address to the RTL kernel. Got it. I also understand that AVALON_MEM is for defining an Avalon_mm interface that will be communicating with the BSP, the latter being responsible for DDR communication. The Avalon_mm interface to put in my RTL component should look like, also for CopyElement and for the avm_loval_bb1_ld interface: 

input avm_local_bb1_ld__readdata, input avm_local_bb1_ld__readdatavalid, input avm_local_bb1_ld__waitrequest, output avm_local_bb1_ld__address, output avm_local_bb1_ld__read, output avm_local_bb1_ld__write, input avm_local_bb1_ld__writeack, output avm_local_bb1_ld__writedata, output avm_local_bb1_ld__byteenable, output avm_local_bb1_ld__burstcount,  

Until here, I'm ok. But now I would like to write some simple code that drives this interface, or at least use a simplifying black box. This is where I'm stuck. Because in the example, I have the idea that Intel designers only took what was produced by aocl compiler (written in openCL), and just copy/pasted it in this example. The modules are unreadable, have curious names, are organized as basic blocks... For instance, there is a module type called lsu_top which reminds me of the LSU generated with OpenCL. This block is directly wired to the previous interface signals. The complexity of the block on the underlying hierarchy prevents me from extracting information on how to drive the interface. Plus the files containing the description of the modules are on the internals of the generated project and so nowhere to be found on the example directory, but this is not the core of my problem. The point is: I am not able to retrieve information from the design example. I made a simple sketch describing my current understanding of the ... thing :) It is attached. Correct me if I'm wrong ! 

 

So my question is: where can I found some information on how to design a RTL module that can interface (INSIDE my OpenCL kernels of course) with the Avalon-MM without writing everything from scratch? 

 

Thanks in advance for any help! 

Alban
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
732 Views

Actually I have the same problem. So far, I can find all necessary modules, for example lsu_top in $QUARTUS_ROOT/hld/ip, and run module level simulation. But I don't understand how can the module communicate with BSP env, like how does the BSP knows when the function is stopped.

0 Kudos
Altera_Forum
Honored Contributor II
732 Views

Few points that may help you both: 

1. If you want to integrate you own RTL into OpenCL system you may want to simulate merged design. Looks like @haveidea wants to do this anyway. BTW I would not involve BSP into simulations yet. There are few CRA registers that you can set up to run kernels that do bus acecssees, etc. Global DDR memory is just an Avalon MM bus. 

2. Officially Intel/Altera do not support simulations of the OpenCL system. However, there is a way to get an Avalon master module and simulate the OpenCL system by running its kernel and bus accesses w/o BSP: 

3. Here is a command for you to try: qsys-generate kernel_system.qsys --simulation=VERILOG --testbench --testbench-simulation. Watch how all necessary the directories and files it will create. Use directory compare before and after command ;). 

It will also generate the simulation model with BFMs and Avalon verification IP of Avalon bus. Thare is more information here https://www.altera.com/support/support-resources/design-examples/design-software/simulation/exm-avalon-verification-ip.html. The master is there. Good plave to start as well as Avalon manual. 

4. This simulation model will use BFMs instead of DDR which is probably enough for basic RTL simulation. @b_alban guess what? You are not the first one to seek these answers, so search a bit more the next time. You can find a way of handling BFM one write and one read at a time at this thread: https://www.alteraforum.com/forum/showthread.php?t=32952&highlight=david+hawkins+%26lt%3bdwh%40ovro.caltech.edu%26gt%3b. If you will need burst transcations to BFMs please contact me off-line. Our company has a solution and we have successfully simulated OpenCL systems. 

And @ b_alban, if you want to share the bus with the vendor BSP bus to DDRx you likely will need an arbiter that interfaces to "complicated stuff". You may have to write it or hire someone. Good luck!
0 Kudos
Altera_Forum
Honored Contributor II
732 Views

 

--- Quote Start ---  

Few points that may help you both: 

1. If you want to integrate you own RTL into OpenCL system you may want to simulate merged design. Looks like @haveidea wants to do this anyway. BTW I would not involve BSP into simulations yet. There are few CRA registers that you can set up to run kernels that do bus acecssees, etc. Global DDR memory is just an Avalon MM bus. 

--- Quote End ---  

 

Yep, that was of course in my mind. BSP simulation ? Looks awful :) 

Actually, it seems that the interface my BSP gives is an Avalon-MM bridge, but if I understood well, this is transparent for the master part. 

 

 

--- Quote Start ---  

2. Officially Intel/Altera do not support simulations of the OpenCL system. However, there is a way to get an Avalon master module and simulate the OpenCL system by running its kernel and bus accesses w/o BSP: 

--- Quote End ---  

 

I dit not get if your ":" refers to point 3. or if you missed a cut/paste?! Because simulating what is produced by the OpenCL compiler with my RTL inside would be interesting. 

 

 

--- Quote Start ---  

3. Here is a command for you to try: qsys-generate kernel_system.qsys --simulation=VERILOG --testbench --testbench-simulation. Watch how all necessary the directories and files it will create. Use directory compare before and after command ;). 

It will also generate the simulation model with BFMs and Avalon verification IP of Avalon bus. Thare is more information here https://www.altera.com/support/support-resources/design-examples/design-software/simulation/exm-avalon-verification-ip.html. The master is there. Good place to start as well as Avalon manual. 

--- Quote End ---  

 

Currently, I do not us qsys to generate my kernel because I simply did not started the project. I'm still wondering if I can avoid using the Qsys GUI. 

I am currently digging into the verification IP. I try to understand how the entire designs are connected. It requires some skills to get through all the procedures. The API to be found in ug_avalon_verification_ip.pdf to drive the BFMs is not complete or higher level instruction are created along the way which obfuscate a little bit my understanding. But I'm working on it. 

 

 

--- Quote Start ---  

4. This simulation model will use BFMs instead of DDR which is probably enough for basic RTL simulation. @b_alban guess what? You are not the first one to seek these answers, so search a bit more the next time. You can find a way of handling BFM one write and one read at a time at this thread: https://www.alteraforum.com/forum/showthread.php?t=32952&highlight=david+hawkins+%26lt%3bdwh%40ovro.caltech.edu%26gt%3b. If you will need burst transcations to BFMs please contact me off-line. Our company has a solution and we have successfully simulated OpenCL systems. 

--- Quote End ---  

 

That I already guessed but finding up-to-date information is tough because there is a high dependancy wrt tool version. And Quartus/Qsys/Altera FPGA architecture changes quite a lot recently! 

I agree, BFM simulation is largely enough. 

 

 

--- Quote Start ---  

And @ b_alban, if you want to share the bus with the vendor BSP bus to DDRx you likely will need an arbiter that interfaces to "complicated stuff". 

--- Quote End ---  

 

The arbitration network is already shipped into the BSP. It resolve all the load/store (cf. intel FPGA SDK for OpenCL Programming guide page 131). I'm wondering now if the bridge connects to the arbitration network and how can we share the bridge between kernel Avalon-MM interfaces ... 

 

Anyway, thanks for the help and tips !
0 Kudos
Altera_Forum
Honored Contributor II
732 Views

Hello everyone, 

 

Coming back to the subject! 

I managed to advance quite well in my path to RTL library to DDR communication. I now have a RTL component that will be encapsulated into an OpenCL library. This component has one pointer and one data on its input part. For the ouput part, it exposes an Avalon MM master interface that will communicate with the BSP bus (theoretically). 

 

But I ran into some curious bug when using the OpenCL compiler: 

 

aoc device/kernel.test.cl -o bin/ddr_w_lib.test.aocx -I lib_src -L lib_src -l ddr_w_lib.aoclib -v -report -fpc -fp-relaxed Resolving library filenames to full paths lib_path = lib_src lib_file = ddr_w_lib.aoclib Resolved ddr_w_lib.aoclib to /.../ddr_w_lib.aoclib lib_path = . aoc: Environment checks are completed successfully. aoc: If necessary for the compile, your BAK files will be cached here: /var/tmp/aocl/ You are now compiling the full flow!! aoc: Selected default target board xpressgxa10_lp1150_v1_ultra_ddr4 aoc: Running OpenCL parser.... error: Unexpected use of HDL library function(s) (possibly due to taking the address of the function)! error: Unexpected use of HDL library function(s) (possibly due to taking the address of the function)! 2 errors generated. Error: OpenCL parser FAILED. Refer to ddr_w_lib.test/kernel_test.log for details.  

There is nothing more to show in kernel_test.log. 

 

Here are my OpenCL kernel and the declaration of my library: 

 

__attribute__((max_global_work_dim(0))) kernel void test_lib (__global ulong* restrict pointer, ulong data) { ddr_w_rtl(pointer, data); }  

You can see that my kernel will simply do one thing. It will write "data" to the DDR at the address "pointer". 

Here is the declaration of my RTL library: 

void ddr_w_rtl (__global ulong* pin, ulong din);  

 

What is wrong here ? What does this strange error mean btw ? 

 

I tried some hack found here (https://www.alteraforum.com/forum/showthread.php?t=57724). So my library definition changed for this: 

void ddr_w_rtl (__global unsigned long long* pin, ulong din);  

 

Here is the output of the compiler this time: 

aoc device/kernel.test.cl -o bin/ddr_w_lib.test.aocx -I lib_src -L lib_src -l ddr_w_lib.aoclib -v -report -fpc -fp-relaxed Resolving library filenames to full paths lib_path = lib_src lib_file = ddr_w_lib.aoclib Resolved ddr_w_lib.aoclib to /home/bourgea/repo/fanna/device/rtl_lib/ddr_w_lib/lib_src/ddr_w_lib.aoclib lib_path = . aoc: Environment checks are completed successfully. aoc: If necessary for the compile, your BAK files will be cached here: /var/tmp/aocl/ You are now compiling the full flow!! aoc: Selected default target board xpressgxa10_lp1150_v1_ultra_ddr4 aoc: Running OpenCL parser.... /home/bourgea/repo/fanna/device/rtl_lib/ddr_w_lib/device/kernel.test.cl:6:12: warning: incompatible pointer types passing '__attribute__((address_space(16776960))) ulong *restrict' (aka '__attribute__((address_space(16776960))) unsigned long *restrict') to parameter of type '__attribute__((address_space(16776960))) unsigned long long *' ddr_w_rtl(pin, din); ^~~ lib_src/ddr_w_lib.h:1:46: note: passing argument to parameter 'pin' here void ddr_w_rtl (__global unsigned long long* pin, ulong din); ^ /home/bourgea/repo/fanna/device/rtl_lib/ddr_w_lib/device/kernel.test.cl:6: Compiler Error: Wrong input type for HDL library function call <ddr_w_rtl> Operand# 1: Expected: pointer Actual: scalar Error: OpenCL parser FAILED. Refer to ddr_w_lib.test/kernel_test.log for details.  

This time, I understand the first warning :) 

But I don't get the error. 

Is this trick the way to go ? 

 

I attached my XML property file just in case the bugs is hidden in it.
0 Kudos
Altera_Forum
Honored Contributor II
732 Views

The ticket opened proved very useful! Intel PSG support was quick to answer. 

 

So my first problem is indeed caused by a compiler bug! I quote the answer from Intel's engineer: 

 

--- Quote Start ---  

 

The compiler fails when an EFI function has the same signature as a normal OpenCL function (in this case the kernel from which it is called). 

 

And you have identified the most appropriate workaround, which is to give the EFI function an "unusual" signature that doesn't clash with any OpenCL functions. 

 

This issue will be fixed in 18.1; we are now re-working EFI to address all of these longstanding issues. 

 

--- Quote End ---  

 

 

So giving unsigned long long to the header was the way to go in order to have a different signature from the OpenCL kernel (test_lib) and the rtl function library (ddr_w_rtl). Even if it raise a warning. 

 

The second part was entirely my fault. I forgot that the order of the argument to an rtl library is defined by the order given in the XML file. I wrote the data part (din) before the pointer (pin) in my XML file, hence the error  

--- Quote Start ---  

Error: Wrong input type for HDL library function call <ddr_w_rtl> Operand# 1: Expected: pointer Actual: scalar 

--- Quote End ---  

 

 

Hope this can help !
Reply