Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21587 Discussions

how to increase frequency

Altera_Forum
Honored Contributor II
2,226 Views

Hi, 

I have written a code in Verilog and its frequency is comming out to be 28MHz only for Cyclone EPC6Q240C8 device. 

 

The place where frequency drops is Average Calculation. 

I have 3 registers 24bit each. 

Say, 

reg1 [23:0] 

reg2 [23:0] 

reg3 [23:0] 

each of these registers contain 8 bit data. And I need to calculate the average of these nos. 

So, how I am doing is: 

//--------------------------------------------------------- 

always block 1 

sum1 <= reg1[23:16] + reg1[15:8] + reg1[7:0]; 

sum2 <= reg2[23:16] + reg2[15:8] + reg2[7:0]; 

sum3 <= reg3[23:16] + reg3[15:8] + reg3[7:0]; 

 

always block 2 

sum4 <= sum1 + sum2 + sum3; 

 

always block 3 

average <= (my conditions)? sum4 / 9 : average; 

average_valid <= (my conditions)? 1'b1 : 1'b0; 

 

//--------------------------------------------------------- 

 

Simply by eleminating this part of code, the frequency of my design shoots to 70MHz. 

 

So, please can anyone provide me with an alternative approach to calculate the average without compromising with the frequency. 

 

Thank you, 

 

-Amit
0 Kudos
14 Replies
Altera_Forum
Honored Contributor II
1,480 Views

You forgot to mention the most important point: Is it a combinational or edge sensitive always block. The speed result seems likely for a combinational one.  

 

However, without considering the role of a clock in the design, the speed results are more or less meaningless. The discussion would be much easier, if you provide a complete design, including a clock. 

 

Practically, you have a clock speed requirement, e.g. 50 MHz. Then the result means, that it isn't possible to perform the complete chain of caculations in one clock cycle. You have to split it across it at least two cycles. This is done automaticly by using edge sensitive always blocks. But you have to know first, how your input and output variables are related to the clock. However, if you intend to perform the complete calculation within a 50 MHz cycle, the result simply means: Give it up!
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

reg [9:0] sum1; 

reg [9:0] sum2; 

reg [9:0] sum3; 

reg [11:0] sum4; 

 

always @ (posedge core_clk or negedge reset_n) 

begin 

if(!reset_n)  

begin 

sum1 <= 10'd0; 

sum2 <= 10'd0; 

sum3 <= 10'd0; 

end 

else 

begin 

sum1 <= row1_reg[23:16] + row1_reg[15:8] + row1_reg[7:0]; 

sum2 <= row2_reg[23:16] + row2_reg[15:8] + row2_reg[7:0]; 

sum3 <= row3_reg[23:16] + row3_reg[15:8] + row3_reg[7:0]; 

end 

end 

 

always @ (posedge core_clk or negedge reset_n) 

begin 

if(!reset_n) 

sum4 <= 12'd0; 

else 

sum4 <= sum1 + sum2 + sum3; 

end 

 

always @ (posedge core_clk or negedge reset_n) 

begin 

if(!reset_n) 

begin 

average <= 12'd0; 

avg_valid <= 1'b0; 

end 

else 

begin 

average <= (valid_flag_r && red_pulse) ? sum4 / 9 : average;  

avg_valid <= ((valid_flag_r && red_pulse) || (valid_flag_g && green_pulse)  

|| (valid_flag_b && blue_pulse))? 1'b1 : 1'b0; 

end 

end 

 

This is basically a part of Average and Median Filter Implementation. 

Right now I am working with 16MHz clock and taking my output through RS232 at 38.4 kbps. But say if I have to work with 48MHz clock with 115 kbps RS232 baud rate, my design won't work!! 

0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

 

--- Quote Start ---  

reg [9:0] sum1; 

reg [9:0] sum2; 

reg [9:0] sum3; 

reg [11:0] sum4; 

 

always @ (posedge core_clk or negedge reset_n) 

begin 

if(!reset_n)  

begin 

sum1 <= 10'd0; 

sum2 <= 10'd0; 

sum3 <= 10'd0; 

end 

else 

begin 

sum1 <= row1_reg[23:16] + row1_reg[15:8] + row1_reg[7:0]; 

sum2 <= row2_reg[23:16] + row2_reg[15:8] + row2_reg[7:0]; 

sum3 <= row3_reg[23:16] + row3_reg[15:8] + row3_reg[7:0]; 

end 

end 

 

always @ (posedge core_clk or negedge reset_n) 

begin 

if(!reset_n) 

sum4 <= 12'd0; 

else 

sum4 <= sum1 + sum2 + sum3; 

end 

 

always @ (posedge core_clk or negedge reset_n) 

begin 

if(!reset_n) 

begin 

average <= 12'd0; 

avg_valid <= 1'b0; 

end 

else 

begin 

average <= (valid_flag_r && red_pulse) ? sum4 / 9 : average;  

avg_valid <= ((valid_flag_r && red_pulse) || (valid_flag_g && green_pulse)  

|| (valid_flag_b && blue_pulse))? 1'b1 : 1'b0; 

end 

end 

 

This is basically a part of Average and Median Filter Implementation. 

Right now I am working with 16MHz clock and taking my output through RS232 at 38.4 kbps. But say if I have to work with 48MHz clock with 115 kbps RS232 baud rate, my design won't work!! 

 

--- Quote End ---  

 

 

Hi, 

 

in which always block is the longest path located ? 

 

Kind regards 

 

GPK
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

So the design is already completely pipelined. The /9 divider should be suspected as the slowest part. But it's likely to achieve more than 28 MHz to my opinion, unless you're stuck to a very slow device family. There must be something else, not understandable from the shown part, e. g. additional timing constraints related to the in and output signals. As pletz suggested, you have to examine the timing analysis details.

0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

Hi all, 

 

I don't see the design is completely pipelined. 

all statements involving three additions can be further pipelined e.g. 

 

sum1 <= row1_reg[23:16] + row1_reg[15:8] + row1_reg[7:0]; 

into : 

sum1a <= row1_reg[23:16] + row1_reg[15:8]; 

sum1b <= row1_reg[7:0]; 

sum1 <= sum1a + sum1b; 

 

and so on. 

 

Moreover, there is another way of doing the average FIR filter using feedback and subtractor instead of full additions.
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

 

--- Quote Start ---  

Hi all, 

 

I don't see the design is completely pipelined. 

all statements involving three additions can be further pipelined e.g. 

 

sum1 <= row1_reg[23:16] + row1_reg[15:8] + row1_reg[7:0]; 

into : 

sum1a <= row1_reg[23:16] + row1_reg[15:8]; 

sum1b <= row1_reg[7:0]; 

sum1 <= sum1a + sum1b; 

 

and so on. 

 

Moreover, there is another way of doing the average FIR filter using feedback and subtractor instead of full additions. 

--- Quote End ---  

 

 

Hi , 

 

I run a test and it looks like that , as FvM mentioned, the divider is the root cause for the problem. 

 

Kind regards 

 

GPK
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

In that case, avoid division by either: 

Use 8 taps then just truncate 3 bits 

or use multiplier instead of division 

 

But in all cases I will pipeline all adders
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

 

--- Quote Start ---  

In that case, avoid division by either: 

Use 8 taps then just truncate 3 bits 

or use multiplier instead of division 

 

But in all cases I will pipeline all adders 

--- Quote End ---  

 

 

Hi Gargamit, 

 

I run test with your design. With Physcical Synhtesis I could improve your clock speed from 34 MHz to 66 MHz. I have the project attached. 

 

Kind regards 

 

GPK
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

my longest path comming out to be is between sum4[8] and average[0]. 

 

and guys there is a typo error in my first post. my device is Cyclone EP1C6Q240C8. 

 

kaz - can u plz elaborate on the method of using 8 taps and then truncating 3 bits. 

 

GPK - what changes did you implement? 

 

Thank you
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

 

--- Quote Start ---  

my longest path comming out to be is between sum4[8] and average[0]. 

 

and guys there is a typo error in my first post. my device is Cyclone EP1C6Q240C8. 

 

kaz - can u plz elaborate on the method of using 8 taps and then truncating 3 bits. 

 

GPK - what changes did you implement? 

 

Thank you 

--- Quote End ---  

 

 

Hi, 

 

I mainly switched on the "Physical Synthesis" Option, which allows Quartus to move Registers in your design, without changing the functionality. I have my Testproject attached to my last post. Have look at it. 

 

Kind regards  

 

GPK
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

Thanx GPK ..... i will sure try that at my end.

0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

truncating 3 bits(instead of explicit division): 

for 8 values average you need to divide the sum by 8 i.e. 2^3 so all you have to do is discard the 3 LSBs from sum4. 

 

using mult instead of divide(since dividers can be slower than multipliers): 

to divide by 9, final sum = sum4/9 = sum4 * 57/512  

i.e. multiply sum4 by 57 then discard 9 bits off final result. 

The value 57 is derived from 512/9 

you can and should use more bits for more accuracy: 

final sum = sum4 * 3641/32768 then discard 15 bits
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

 

--- Quote Start ---  

I don't see the design is completely pipelined. all statements involving three additions can be further pipelined e.g. 

--- Quote End ---  

 

Yes, that's true. I assumed however, that two 12-Bit additions in a cycle won't be an issue for 48 MHz. 

 

Average of 9 needs a divider or a approximation by integer multiply/shift. But with Cyclone, the integer multiply is converted to multiple additions and may cause timing problems as well. Alternatively, the divider can be pipelined, using a MegFunction. Cause dividers are resource consuming, I generally use a serial divider, where ever applicable. It e. g. takes 4 cycles for a /9 division.
0 Kudos
Altera_Forum
Honored Contributor II
1,480 Views

 

--- Quote Start ---  

truncating 3 bits(instead of explicit division): 

for 8 values average you need to divide the sum by 8 i.e. 2^3 so all you have to do is discard the 3 LSBs from sum4. 

 

using mult instead of divide(since dividers can be slower than multipliers): 

to divide by 9, final sum = sum4/9 = sum4 * 57/512  

i.e. multiply sum4 by 57 then discard 9 bits off final result. 

The value 57 is derived from 512/9 

you can and should use more bits for more accuracy: 

final sum = sum4 * 3641/32768 then discard 15 bits 

--- Quote End ---  

 

//------------------------------------------------------------ 

 

hey thanx everyone ...... 

the multiplication funda worked for me ....... multiplying the sum by 57 and then discarding the lsb 9 bits, frequency almost increased to 3 times ...... the design is now at 80MHz. 

cheers 

 

-amit garg
0 Kudos
Reply