- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a design which does single precision computations using Altera Floating point IPs. However since these IPs don't seem to have a 'valid' or 'done' output bit, I'm not able to see how to connect one module to another one. My concern is that how will a successive module know when to take the output from the previous module. Could someone help with this?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A valid or done bit is only meaningful for sequentially operating units. The said FP MegaFunctions are fully pipelined, emitting a new result every clock cycle. The input data must be valid only for one clock cycle, you have to know the pipeline delay, however.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
its quite easy just to store a valid bit in parrallel to the floating point units if you need it for other modules.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would have been smart thinking from the Altera guys if they had provisioned the 'valid' pipeline inside the building block. It would make for a much cleaner design as we don't have to add the glue logic mentioned by Tricky.
While we at it: can we have a separate clock enable for every stage too? I have a dataflow based development environment, but because all of Altera's building blocks use a global clock enable I'll be stuck when I would need more advanced functionality (or with a pipeline greater than 1).- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could anyone give a small code example to show what the glue logic about the parallel bit is being talked about?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The "parallel" valid bit chain is simply a shift register (respectively a number of cascaded D-FFs), the delay (number of stages) is equal to the pipeline delay of the respective IP block.
I aggree, that Altera could have added it as an option, but as mentioned above, it won't be of any use in the standard application, where a continous data stream is fed to the IP.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So what I understand:
1-Instantiate IP in module 2-Also make a shift register to implement the latency delay of your IP 3-The shift register holds a '1' for the 'valid' bit which gets successively gets shifted and is finally given as output. Right? Also what is the buffer capacity of the IPs, if I keep giving new data in every clock cycle, how long before I have to stall the input data before the IP starts giving wrong outputs?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the design is fully pipelined there's no buffer.
it can process a new input for every clock cycle. After an initial delay, the IP provides an output for every clock cycle.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But there are two different delays for an IP:
The delay between first input and output and the delay between subsequent outputs assuming inputs are being given every clock cycle. For example for the exponential core, there is a latency delay of 17 clock cycles between the first input and output but subsequent outputs appear at intervals of 6 clock cycles(not every next clock cycle) assuming new input data is being given at every clock cycle. Hence I was thinking that there will be a point where probably the buffer or whatever the mechanism inside the IP is, will be overflown by the input data. Am I correct in my understanding? Thanks.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you tell me where did you read the 6 cycles delays between the subsequent inputs and the results?
I'm trying to read the "Floating-Point Megafunctions User Guide" and still found nothing. Thx- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I assumed the same myself after reading the IP documentation. (Page-35 of floating point megafunctions userguide gives the detail for floating exponential IP). But after instantiating the IP and running testbench on the code, I found that initially it takes 17 clock cycles to produce the output and thereafter it takes only 6 clock cycles. Try it. Just instantiate the IP and run a testbench which constantly supplies input data. Let me know what you find.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My bad, I had overlooked the delays introduced by me in my testbench which were reflected in my output. Now that I'm giving input in every clock cycle after the first delay of 17 clock cycles, one output appears every new clock cycle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- I aggree, that Altera could have added it as an option, but as mentioned above, it won't be of any use in the standard application, where a continous data stream is fed to the IP. --- Quote End --- Even a continuous data stream has a beginning (and in what most of us call a continuous video stream there are regular gaps) so an 'embedded' valid bit will almost always come in handy, and in case you really don't need it the complier would optimize it away.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page