Solved: using HLS to generate a simple non-stalling feedforward pipeline with a deterministic latency

AHsu32 · ‎03-20-2020

Hi there!

I'm using Intel HLS to develop functional units as part of a larger project to generate soft processors. Due to the requirements of the pipeline design, the units need to be statically scheduled, thus I need to know the latency ahead of time.

When I try to create even simple units (e.g. fixed length dot product), the resulting HLS component has non-deterministic latency and stalls on what appears to but input argument read. (Latency as reported in the verification results as part of the cosimulation flow.) I'm not quite sure where this stalling comes from.

I do know there are the two attributes `hls_always_run_component` and `hls_stall_free_return` that disable the input and output async handshaking interface, respectively. While the latter is useful, the former attribute removes the ability to cosimulate. (I assume this has something to do with the software API to call the component simulation?) I assume if I enable `hls_always_run_component`, the result is a component with a deterministic latency. (Is this true?) Unfortunately, it's quite undesirable for my larger project if the automated testbench and latency reporting not to work.

Is it possible to keep the HLS cosimulation flow but generate non-stalling deterministic latency logic?

Thanks!

AHsu32 · ‎05-06-2020

Sorry the late response. I was able to get in contact with some Intel HLS developers that helped me with configuring the design.

The source of the latency indeterminism had something to do with the S10 HyperFlex pipeline register optimizations. Since my constraints were more latency sensitive than frequency sensitive, they suggested I turn the optimization off which generated the simple feedforward pipeline I expected; at least as far as I could tell through repeated simulation runs. (Turn off "hyper-optimized-handshaking".)

It is important to note that for this to work, I did write the code in such a way that I expected the result to be stall-free and feedforward. I learned that the attributes do not do anything to help generate such a design; they just modify the interface which helps if the design already is stall-free. As noted in the original question, the only stall was in the input interface which we discovered was caused by those S10 optimizations.

View solution in original post

AnilErinch_A_Intel · ‎04-13-2020

Hi

We are looking in to the issue,

can you mention which version of the HLS you are working with.

Thanks and Regards

Anil

AHsu32 · ‎05-06-2020

Sorry the late response. I was able to get in contact with some Intel HLS developers that helped me with configuring the design.

The source of the latency indeterminism had something to do with the S10 HyperFlex pipeline register optimizations. Since my constraints were more latency sensitive than frequency sensitive, they suggested I turn the optimization off which generated the simple feedforward pipeline I expected; at least as far as I could tell through repeated simulation runs. (Turn off "hyper-optimized-handshaking".)

It is important to note that for this to work, I did write the code in such a way that I expected the result to be stall-free and feedforward. I learned that the attributes do not do anything to help generate such a design; they just modify the interface which helps if the design already is stall-free. As noted in the original question, the only stall was in the input interface which we discovered was caused by those S10 optimizations.

AnilErinch_A_Intel · ‎05-07-2020

Hi

Glad that you figured out the answer.

Thanks and Regards

Anil