Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17049 Discussions

How to interconnect modules without 'valid' or 'done' output signal

Altera_Forum
Honored Contributor II
3,376 Views

I have a design which does single precision computations using Altera Floating point IPs. However since these IPs don't seem to have a 'valid' or 'done' output bit, I'm not able to see how to connect one module to another one. My concern is that how will a successive module know when to take the output from the previous module. Could someone help with this?

0 Kudos
12 Replies
Altera_Forum
Honored Contributor II
1,310 Views

A valid or done bit is only meaningful for sequentially operating units. The said FP MegaFunctions are fully pipelined, emitting a new result every clock cycle. The input data must be valid only for one clock cycle, you have to know the pipeline delay, however.

0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

its quite easy just to store a valid bit in parrallel to the floating point units if you need it for other modules.

0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

It would have been smart thinking from the Altera guys if they had provisioned the 'valid' pipeline inside the building block. It would make for a much cleaner design as we don't have to add the glue logic mentioned by Tricky. 

While we at it: can we have a separate clock enable for every stage too? I have a dataflow based development environment, but because all of Altera's building blocks use a global clock enable I'll be stuck when I would need more advanced functionality (or with a pipeline greater than 1).
0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

Could anyone give a small code example to show what the glue logic about the parallel bit is being talked about?

0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

The "parallel" valid bit chain is simply a shift register (respectively a number of cascaded D-FFs), the delay (number of stages) is equal to the pipeline delay of the respective IP block. 

 

I aggree, that Altera could have added it as an option, but as mentioned above, it won't be of any use in the standard application, where a continous data stream is fed to the IP.
0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

So what I understand: 

 

1-Instantiate IP in module 

2-Also make a shift register to implement the latency delay of your IP 

3-The shift register holds a '1' for the 'valid' bit which gets successively gets shifted and is finally given as output. 

 

Right? 

 

Also what is the buffer capacity of the IPs, if I keep giving new data in every clock cycle, how long before I have to stall the input data before the IP starts giving wrong outputs?
0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

If the design is fully pipelined there's no buffer. 

it can process a new input for every clock cycle. 

 

After an initial delay, the IP provides an output for every clock cycle.
0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

But there are two different delays for an IP: 

The delay between first input and output and the delay between subsequent outputs assuming inputs are being given every clock cycle. 

 

For example for the exponential core, 

there is a latency delay of 17 clock cycles between the first input and output but subsequent outputs appear at intervals of 6 clock cycles(not every next clock cycle) assuming new input data is being given at every clock cycle. Hence I was thinking that there will be a point where probably the buffer or whatever the mechanism inside the IP is, will be overflown by the input data. Am I correct in my understanding? Thanks.
0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

Can you tell me where did you read the 6 cycles delays between the subsequent inputs and the results? 

 

I'm trying to read the "Floating-Point Megafunctions 

User Guide" and still found nothing. 

 

Thx
0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

I assumed the same myself after reading the IP documentation. (Page-35 of floating point megafunctions userguide gives the detail for floating exponential IP). But after instantiating the IP and running testbench on the code, I found that initially it takes 17 clock cycles to produce the output and thereafter it takes only 6 clock cycles. Try it. Just instantiate the IP and run a testbench which constantly supplies input data. Let me know what you find.

0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

My bad, I had overlooked the delays introduced by me in my testbench which were reflected in my output. Now that I'm giving input in every clock cycle after the first delay of 17 clock cycles, one output appears every new clock cycle.

0 Kudos
Altera_Forum
Honored Contributor II
1,310 Views

 

--- Quote Start ---  

I aggree, that Altera could have added it as an option, but as mentioned above, it won't be of any use in the standard application, where a continous data stream is fed to the IP. 

--- Quote End ---  

 

Even a continuous data stream has a beginning (and in what most of us call a continuous video stream there are regular gaps) so an 'embedded' valid bit will almost always come in handy, and in case you really don't need it the complier would optimize it away.
0 Kudos
Reply