Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16601 Discussions

Intel HLS streaming problem

EGaut
Novice
1,831 Views

In order to test Intel HLS we developed a producer/consumer project. The producer is a software module executing on the Intel HPS running Linux and the consumer is a hardware module connected to h2f_axi and f2h_axi (DDR) buses.

 

The consumer module needs to be connected to two fifos (in and out) to communicate with the producer and the memory subsystem. 

In order to do that, we used three interfaces: One ihc::stream_in, one ihc::stream_out and a memory mapped master interface ihc::mm_master. 

 

The component is compiled by using i++ (Intel HLS) with the hls_stall_free_return and hls_always_run_component macro. Here is the component signature:

hls_stall_free_return hls_always_run_component component int consumerRAM0(ihc::mm_master<word4_t,ihc::aspace<8>, ihc::awidth<32>, ihc::dwidth<32>, ihc::latency<0>, ihc::waitrequest<true> >& device_6, ihc::stream_in<word4_t>& module1_read, ihc::stream_out<word4_t>& module1_write)

You may find the c++ sources in the build archive attached to this post.

 

Originally the component was declared with a void return value. We added the int return type to see if it would change the component's behavior. It didn't change anything.

 

Here is the command used to compile the component

i++ -O2 -march=5CSEMA5F31C6 --simulator none --clock 10ns --component consumerRAM0 consumerRAM0_ihls.cpp -o consumerRAM0

We target a Cyclone V (DE1-SoC) and the component was tested under both Intel HLS 17.1 and 18.0 with the same results.

 

The consumer execute its operations in the following order:

  1. The component recuperates a value from a fifo through the stream_in interface. In this case, the value is a memory address.
  2. The component then use the address to make a 32bits read operation through the memory mapped interface 
  3. Finally, the value read from the memory mapped operation is then sent through the stream_out interface to another fifo (different fifo from 1.)

 

When we implement the component on the fpga we get the following behavior:

  • The initial input data is correctly read. 
  • The memory access done through the Avalon MM interface is also valid.
  • The stream_out operation on the other hand is done at the correct time but the signals are asserted for too long.

 

After further investigation, it seems that the HW module loops and close the stream_out operation only when one of the stream_in (module1_read) inputs changes. Otherwise, all the stream_out (module1_write) signals stays active. This has the side effect of writing multiple time the same value into the fifo connected to the stream_out interface.

 

In order to better demonstrate the problem, we ran the generated HDL files through a custom testbench developed with the help of Modelsim Intel FPGA Starter Edition.

 

For the first test, we force a 10 cycles delay between new input values. This is done in order for the component to be able to process the request. In the following waveform we clearly see that the write signal (module1_write_valid) issue by the component stay asserted for more than one cycle. 

stuck_signal.png

 

 

 

 

 

 

 

 

 

If we change the input of the stream_in interface for every clock cycles, we see that the valid signal stay only one cycle as specified.

ok_signal.png

 

 

 

 

 

 

 

 

 

Also, if there is no more data in the input fifo connected to the stream_in interface, the component will also stall at the write operation again writing multiple times the same value. The write signal eventually drop after 180ns or 18 clock cycles...

 

So this points to both read and write operations being link together and not treated as atomic operations. According to the Intel documentation for the Avalon Interface Specifications (mnl_avalon_spec.pdf version 18.0): 

 

The valid signal qualifies valid data on any cycle where data is being transferred from the source to the sink. On each valid cycle the data signal and other source to sink signals are sampled by the sink.  

 

In other words, the component should assert the valid signal for one cycle for every new value. Here we see that the valid signal stays up more than one cycle.

 

At this point, we are wondering if it's possible to ensure the atomicity of each operations ?

0 Kudos
7 Replies
HGuer2
Beginner
780 Views

Anyone?

0 Kudos
MuhammadAr_U_Intel
780 Views

@EGaut​ 

 

Hi,

 

I am looking at this case, I scanned through the description and waveform you shared.

I understand you compiled without testbench and later created test setup separately.

Looking at the attached files I didn't find the simulation setup and testbench, would you be able to share it to look further into it ?

 

Thanks,

Arslan

0 Kudos
EGaut
Novice
780 Views

Exactly. I tested the result with a ModelSim test bench. You may find the sources in the archive attached to this post.

0 Kudos
MuhammadAr_U_Intel
780 Views

Thank you for sharing the testbench, I am able to replicate the issue and looking into it.

0 Kudos
MuhammadAr_U_Intel
780 Views

Hi @EGaut​ ,

 

I have looked into the testbench and HLS component.

This is not a bug,

Refer to zoomed out version of Waveform. Testbench provide 10 valid inputs on the read stream (both valid and ready are high as per the avalon spec). The component then reads from the specified address in memory and provides 10 valid writes with the correct data from memory on the output stream (both valid and ready are high as per the avalon spec)

 

not_a_bug_Capture.PNG

 

 

Further I modified your testbench to mimic same scenario, with gap between read/ write.

Read valid is high for one clock cycle, you can see correct behavior.

Attached a waveform for ref

 

customer_tb_not_a_bug_Capture.PNG

 

for further reading you may want to refer to avalon spec

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl_avalon_spec.pdf

PAGE 47

avalon_spec_Capture.PNG

 

Hopefully this will address the question.

 

Thanks,

Arslan

0 Kudos
EGaut
Novice
780 Views

Hi @MUsman​ 

 

sorry for the late response.

 

The problem was indeed on my end. What I didn’t see was that the fifo accepted10 identical values and gave them back correctly to the test bench. Like you said, it’s as designed.

 

Our problem was with the way we used the fifo. We used the sopc fifo generator instead of directly using the dc_fifo megafunction inside our VHDL. With the sopc version, the only way to see if a fifo is empty is by using the avalon interfaces. This was injecting huge delay in our design. Our hardware module was able to read multiple erroneous value before receiving the empty signal from the fifo. Using the dc_fifo and its empty signal fixed everything.

 

Thanks & regards

MuhammadAr_U_Intel
780 Views

Glad to know you are able to proceed with using the dc_fifo.

 

Thanks,

Arslan

0 Kudos
Reply