- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have written a code in Verilog and its frequency is comming out to be 28MHz only for Cyclone EPC6Q240C8 device. The place where frequency drops is Average Calculation. I have 3 registers 24bit each. Say, reg1 [23:0] reg2 [23:0] reg3 [23:0] each of these registers contain 8 bit data. And I need to calculate the average of these nos. So, how I am doing is: //--------------------------------------------------------- always block 1 sum1 <= reg1[23:16] + reg1[15:8] + reg1[7:0]; sum2 <= reg2[23:16] + reg2[15:8] + reg2[7:0]; sum3 <= reg3[23:16] + reg3[15:8] + reg3[7:0]; always block 2 sum4 <= sum1 + sum2 + sum3; always block 3 average <= (my conditions)? sum4 / 9 : average; average_valid <= (my conditions)? 1'b1 : 1'b0; //--------------------------------------------------------- Simply by eleminating this part of code, the frequency of my design shoots to 70MHz. So, please can anyone provide me with an alternative approach to calculate the average without compromising with the frequency. Thank you, -AmitLink Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You forgot to mention the most important point: Is it a combinational or edge sensitive always block. The speed result seems likely for a combinational one.
However, without considering the role of a clock in the design, the speed results are more or less meaningless. The discussion would be much easier, if you provide a complete design, including a clock. Practically, you have a clock speed requirement, e.g. 50 MHz. Then the result means, that it isn't possible to perform the complete chain of caculations in one clock cycle. You have to split it across it at least two cycles. This is done automaticly by using edge sensitive always blocks. But you have to know first, how your input and output variables are related to the clock. However, if you intend to perform the complete calculation within a 50 MHz cycle, the result simply means: Give it up!- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
reg [9:0] sum1;
reg [9:0] sum2; reg [9:0] sum3; reg [11:0] sum4; always @ (posedge core_clk or negedge reset_n) begin if(!reset_n) begin sum1 <= 10'd0; sum2 <= 10'd0; sum3 <= 10'd0; end else begin sum1 <= row1_reg[23:16] + row1_reg[15:8] + row1_reg[7:0]; sum2 <= row2_reg[23:16] + row2_reg[15:8] + row2_reg[7:0]; sum3 <= row3_reg[23:16] + row3_reg[15:8] + row3_reg[7:0]; end end always @ (posedge core_clk or negedge reset_n) begin if(!reset_n) sum4 <= 12'd0; else sum4 <= sum1 + sum2 + sum3; end always @ (posedge core_clk or negedge reset_n) begin if(!reset_n) begin average <= 12'd0; avg_valid <= 1'b0; end else begin average <= (valid_flag_r && red_pulse) ? sum4 / 9 : average; avg_valid <= ((valid_flag_r && red_pulse) || (valid_flag_g && green_pulse) || (valid_flag_b && blue_pulse))? 1'b1 : 1'b0; end end This is basically a part of Average and Median Filter Implementation. Right now I am working with 16MHz clock and taking my output through RS232 at 38.4 kbps. But say if I have to work with 48MHz clock with 115 kbps RS232 baud rate, my design won't work!!- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- reg [9:0] sum1; reg [9:0] sum2; reg [9:0] sum3; reg [11:0] sum4; always @ (posedge core_clk or negedge reset_n) begin if(!reset_n) begin sum1 <= 10'd0; sum2 <= 10'd0; sum3 <= 10'd0; end else begin sum1 <= row1_reg[23:16] + row1_reg[15:8] + row1_reg[7:0]; sum2 <= row2_reg[23:16] + row2_reg[15:8] + row2_reg[7:0]; sum3 <= row3_reg[23:16] + row3_reg[15:8] + row3_reg[7:0]; end end always @ (posedge core_clk or negedge reset_n) begin if(!reset_n) sum4 <= 12'd0; else sum4 <= sum1 + sum2 + sum3; end always @ (posedge core_clk or negedge reset_n) begin if(!reset_n) begin average <= 12'd0; avg_valid <= 1'b0; end else begin average <= (valid_flag_r && red_pulse) ? sum4 / 9 : average; avg_valid <= ((valid_flag_r && red_pulse) || (valid_flag_g && green_pulse) || (valid_flag_b && blue_pulse))? 1'b1 : 1'b0; end end This is basically a part of Average and Median Filter Implementation. Right now I am working with 16MHz clock and taking my output through RS232 at 38.4 kbps. But say if I have to work with 48MHz clock with 115 kbps RS232 baud rate, my design won't work!! --- Quote End --- Hi, in which always block is the longest path located ? Kind regards GPK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So the design is already completely pipelined. The /9 divider should be suspected as the slowest part. But it's likely to achieve more than 28 MHz to my opinion, unless you're stuck to a very slow device family. There must be something else, not understandable from the shown part, e. g. additional timing constraints related to the in and output signals. As pletz suggested, you have to examine the timing analysis details.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I don't see the design is completely pipelined. all statements involving three additions can be further pipelined e.g. sum1 <= row1_reg[23:16] + row1_reg[15:8] + row1_reg[7:0]; into : sum1a <= row1_reg[23:16] + row1_reg[15:8]; sum1b <= row1_reg[7:0]; sum1 <= sum1a + sum1b; and so on. Moreover, there is another way of doing the average FIR filter using feedback and subtractor instead of full additions.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Hi all, I don't see the design is completely pipelined. all statements involving three additions can be further pipelined e.g. sum1 <= row1_reg[23:16] + row1_reg[15:8] + row1_reg[7:0]; into : sum1a <= row1_reg[23:16] + row1_reg[15:8]; sum1b <= row1_reg[7:0]; sum1 <= sum1a + sum1b; and so on. Moreover, there is another way of doing the average FIR filter using feedback and subtractor instead of full additions. --- Quote End --- Hi , I run a test and it looks like that , as FvM mentioned, the divider is the root cause for the problem. Kind regards GPK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In that case, avoid division by either:
Use 8 taps then just truncate 3 bits or use multiplier instead of division But in all cases I will pipeline all adders- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- In that case, avoid division by either: Use 8 taps then just truncate 3 bits or use multiplier instead of division But in all cases I will pipeline all adders --- Quote End --- Hi Gargamit, I run test with your design. With Physcical Synhtesis I could improve your clock speed from 34 MHz to 66 MHz. I have the project attached. Kind regards GPK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
my longest path comming out to be is between sum4[8] and average[0].
and guys there is a typo error in my first post. my device is Cyclone EP1C6Q240C8. kaz - can u plz elaborate on the method of using 8 taps and then truncating 3 bits. GPK - what changes did you implement? Thank you- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- my longest path comming out to be is between sum4[8] and average[0]. and guys there is a typo error in my first post. my device is Cyclone EP1C6Q240C8. kaz - can u plz elaborate on the method of using 8 taps and then truncating 3 bits. GPK - what changes did you implement? Thank you --- Quote End --- Hi, I mainly switched on the "Physical Synthesis" Option, which allows Quartus to move Registers in your design, without changing the functionality. I have my Testproject attached to my last post. Have look at it. Kind regards GPK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanx GPK ..... i will sure try that at my end.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
truncating 3 bits(instead of explicit division):
for 8 values average you need to divide the sum by 8 i.e. 2^3 so all you have to do is discard the 3 LSBs from sum4. using mult instead of divide(since dividers can be slower than multipliers): to divide by 9, final sum = sum4/9 = sum4 * 57/512 i.e. multiply sum4 by 57 then discard 9 bits off final result. The value 57 is derived from 512/9 you can and should use more bits for more accuracy: final sum = sum4 * 3641/32768 then discard 15 bits- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- I don't see the design is completely pipelined. all statements involving three additions can be further pipelined e.g. --- Quote End --- Yes, that's true. I assumed however, that two 12-Bit additions in a cycle won't be an issue for 48 MHz. Average of 9 needs a divider or a approximation by integer multiply/shift. But with Cyclone, the integer multiply is converted to multiple additions and may cause timing problems as well. Alternatively, the divider can be pipelined, using a MegFunction. Cause dividers are resource consuming, I generally use a serial divider, where ever applicable. It e. g. takes 4 cycles for a /9 division.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- truncating 3 bits(instead of explicit division): for 8 values average you need to divide the sum by 8 i.e. 2^3 so all you have to do is discard the 3 LSBs from sum4. using mult instead of divide(since dividers can be slower than multipliers): to divide by 9, final sum = sum4/9 = sum4 * 57/512 i.e. multiply sum4 by 57 then discard 9 bits off final result. The value 57 is derived from 512/9 you can and should use more bits for more accuracy: final sum = sum4 * 3641/32768 then discard 15 bits --- Quote End --- //------------------------------------------------------------ hey thanx everyone ...... the multiplication funda worked for me ....... multiplying the sum by 57 and then discarding the lsb 9 bits, frequency almost increased to 3 times ...... the design is now at 80MHz. cheers -amit garg

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page