Re: Floating point numbers and latency

Altera_Forum · ‎04-29-2011

I have just implemented the Fibonacci design explained on the ch4 Primitive Library Tutorial.

I wanted to try it using floating point numbers so i changed the parameters of the blocks, after the first run I get the following error:

Failed to distribute memory in your design.

Found insufficient delay around the following registers while attempting to satisfy fMax requirement for [Subsystem]. Increase latency around this loop by at least 12, or reduce the fMax requirement. (Note that there may also be other failing loops).

Then I added the extra delays in the loop (the fMax was only 1 MHz) and I got the output showed in the attached file, which makes sense after having analyzed the system.

My questions are:

1- Do I need extra delays with the floating number because of the complex design to manage them?

2- How in this case should I change the design to work properly?

3- I didnt fully get the sense of the Latency param on the channel out, does it simply show the number of the pipeline stage added to get the maximum efficiency? If yes which is the useful thing about that? Could it not be more useful showing just the Latency of the entire subsystem?

Thanks in advance!

Altera_Forum · ‎04-29-2011

Just curious, why use floating point at all. I see this popping up in many posts. In fpgas floating point requires great lots of resource. If you want higher resolution why not just use fixed point of very wide bitwidth...

Altera_Forum · ‎04-29-2011

--- Quote Start ---

Just curious, why use floating point at all. I see this popping up in many posts. In fpgas floating point requires great lots of resource. If you want higher resolution why not just use fixed point of very wide bitwidth...

--- Quote End ---

Because for some applications are better and if DSP offers this possibility why dont exploit it?

Altera_Forum · ‎04-29-2011

Frankly I have never heard of any fpga engineer using fp. May be those in Nasa Space centre will object. There might be some cases which I am not aware of. I am personally not clearheaded about the difference in computational efficiency between fp and wide fixed point as wide as fp or twice !.

You can see for yourself an Fmax of 1MHz, certainly useless. I will be surprised if your tool does not let you go fixed point.

Altera_Forum · ‎04-29-2011

--- Quote Start ---

Frankly I have never heard of any fpga engineer using fp. May be those in Nasa Space centre will object. There might be some cases which I am not aware of. I am personally not clearheaded about the difference in computational efficiency between fp and wide fixed point as wide as fp or twice !.

You can see for yourself an Fmax of 1MHz, certainly useless. I will be surprised if your tool does not let you go fixed point.

--- Quote End ---

The purpose of this topic was just to understand why the fp are not working in my design :D

Im saying it without any rancor and without being argumentative :D

Altera_Forum · ‎04-29-2011

Apologies, I am trying to be as practical as I can. I hope also to heat up the thread

Altera_Forum · ‎05-03-2011

--- Quote Start ---

I have just implemented the Fibonacci design explained on the ch4 Primitive Library Tutorial.

I wanted to try it using floating point numbers so i changed the parameters of the blocks, after the first run I get the following error:

Failed to distribute memory in your design.

Found insufficient delay around the following registers while attempting to satisfy fMax requirement for [Subsystem]. Increase latency around this loop by at least 12, or reduce the fMax requirement. (Note that there may also be other failing loops).

Then I added the extra delays in the loop (the fMax was only 1 MHz) and I got the output showed in the attached file, which makes sense after having analyzed the system.

My questions are:

1- Do I need extra delays with the floating number because of the complex design to manage them?

--- Quote End ---

The reason you need more delay than if it was implemented in fixed point is that a floating point add has more pipelining stages. Floating point pipelining in DSP Builder is not quite as advanced as fixed point pipelining so reducing the FMax won't help in this situation.

Any designs with feedback loops will have this problem as they can do with fixed point designs.

--- Quote Start ---

2- How in this case should I change the design to work properly?

--- Quote End ---

One way to do this would be to supply data at a slower rate than the clock rate. E.g. Have valid go high only once every twelve cycles, then increase the Sample Delay lengths in your design to 12.

--- Quote Start ---

3- I didnt fully get the sense of the Latency param on the channel out, does it simply show the number of the pipeline stage added to get the maximum efficiency? If yes which is the useful thing about that? Could it not be more useful showing just the Latency of the entire subsystem?

--- Quote End ---

Assuming you wired the valid input directly up to the valid output, then the latency parameter should give you the number of cycles you would wait to see the output asserted after asserting the input. It's a bit more complicated than just the number of pipeline stages added because of the effect of sample delays.

Altera_Forum · ‎05-03-2011

thank you all, I've learned a lot reading this.

Altera_Forum · ‎05-04-2011

--- Quote Start ---

One way to do this would be to supply data at a slower rate than the clock rate. E.g. Have valid go high only once every twelve cycles, then increase the Sample Delay lengths in your design to 12.

--- Quote End ---

And if it was like in this case that we supply the data just once? I have attached my design, the only way that I can think to have something like that is to put a FF just before d0 driven by the valid signal, but how to do this in DSP?

--- Quote Start ---

Assuming you wired the valid input directly up to the valid output, then the latency parameter should give you the number of cycles you would wait to see the output asserted after asserting the input.

--- Quote End ---

In the first place this was what I got, but then I changed idea reading the attached 'Zero Latency Example' of the 'DSP Builder Handbook Volume 3'.

I mean your statement is true only if we set latency constrains in cases like the Zero Latency; am I right?

Thanks in advance

Altera_Forum · ‎05-04-2011

--- Quote Start ---

And if it was like in this case that we supply the data just once? I have attached my design, the only way that I can think to have something like that is to put a FF just before d0 driven by the valid signal, but how to do this in DSP?

--- Quote End ---

I've attached how the demo_fibonacci design should be modified using to use floating point. One modification I made was to the test bench. I changed it so that valid is held high for enough cycles to clear the Sample Delays, i.e. at least 28 cycles. The design is really misusing valid. It's really just an input that signals that the data should be reset in this case.

--- Quote Start ---

In the first place this was what I got, but then I changed idea reading the attached 'Zero Latency Example' of the 'DSP Builder Handbook Volume 3'.

I mean your statement is true only if we set latency constrains in cases like the Zero Latency; am I right?

--- Quote End ---

I'm not sure I follow what you're saying.

Altera_Forum · ‎05-04-2011

Thanks a lot for the new design.

--- Quote Start ---

I'm not sure I follow what you're saying.

--- Quote End ---

What Im trying to say is that if the Latency parameter was : 'the number of cycles you would wait to see the output asserted after asserting the input', if we take into account the 'Zero latency example', lets say a t=0 we change the input, at the output we will have: at t=0 the valid signal at only at t=10 the asserted output.

Should not it be better in that case let the DSP builder set the Latency parameter at 10 instead of 0 ?

Altera_Forum · ‎05-04-2011

In your zero latency example, it's true you would wait 10 cycles to see the data_out change, but you wouldn't wait any cycles at all to see valid_out change. That's why it has latency 0. The latency of 10 is functionally part of the design.