lpm functions and hdl

Altera_Forum · ‎03-01-2018

Hi all,

So I'm trying to create a (synchronous) 8-bit loadable counter, as below but for some reason this synthesizes to around 28 LE's compared to the altera LPM 11 LE version. Any idea why?

Thanks!

-Mux

module register8 (

clk,

clken,

reset_n,

data,

load,

cnt_en,

updown,

q,

q_next

);

input clk;

input clken;

input reset_n;

input [7:0] data;

input load;

input cnt_en;

input updown;

output [7:0] q, q_next;

reg [7:0] addsub;

always @(*)

if ( updown )

addsub = q - 8'd1;

else

addsub = q + 8'd1;

// asynchronous 'next' value. This is used extensively for flags

reg [7:0] q, q_next;

always @(*)

if ( load )

q_next = data;

else

begin

if ( cnt_en )

q_next = addsub;

else

q_next = q;

end

always @(posedge clk or negedge reset_n )

if ( !reset_n )

q <= 8'd0;

else if ( clken )

q <= q_next;

endmodule

Altera_Forum · ‎03-02-2018

Hi ,

Most likely cause would be due to the use of the +/- operators.

-Abr

Altera_Forum · ‎03-02-2018

Yes and no :-) If you choose an LPM_ADDSUB the cost for the lpm function goes to 16+ LE's whereas the counter comes in as 8LE's + 8LUT's. So what's the best way to synthesize and up/down counter without +/-? Anyone know?

-Mux

Altera_Forum · ‎03-02-2018

--- Quote Start ---

So I'm trying to create a (synchronous) 8-bit loadable counter, as below but for some reason this synthesizes to around 28 LE's compared to the altera LPM 11 LE version.

--- Quote End ---

Although you didn't reveal the used FPGA family (4 or 6-input LUT?), there's obviously something wrong with the comparison. The Full featured 8-bit LPM counter with up-down and asynchronous parallel load enabled never synthesizes in only 11 LE. I guess you have some functions unconnected.

I wonder however about the purpose of an asynchronous load which can't be effectively implemented in any recent Altera FPGA family. May be you compared incomparable designs?

..............

Just realized that it's actually a synchronous load implemented by a mux. It does even implement in 9 LE on Cyclone III. The shown design is at least different by exposing the q_next outputs.

..............

After removing q_next from the interface, register8 is implemented in 9 LE as well.

Altera_Forum · ‎03-02-2018

It's actually for a Cylcone II, so it's(IIRC) a 4-input LUT. Even with all (synchronous) functions enabled and an asynchronous reset, it comes down to 11 LE's and is listed in as 8LE's + 8 LUT's. I *kinda* need the q_next in order to get the answer before the next clock but if that makes a huge difference than that's 1 extra clock per instruction, which would kinda suck.. I'll give it a whirl though...

-Mux

Altera_Forum · ‎03-02-2018

Did some more sleuthing and it's indeed the fact that you expose next_q. Without exposing next_q a synchronous, 8-bit loadable counter with asynchronous reset comes in at 9LE's, which is inline with the LPM function.

I'm assuming that counters in lpm functions are just done as flip-flop's rather than adders which expands the whole thing into a 28 bit monster (all things considered). So I now get to choose whether or not I'll take the hit on an extra clock or additional resources :-)

Thanks to all who have replied!

-Mux

Altera_Forum · ‎03-03-2018

Unless you really need the resources, you should ALWAYS choose whichever option is easier for a fellow engineer (or yourself in 12 months time) to understand. Choosing an obscure "clever" option will never earn you brownie points with anyone.

Altera_Forum · ‎03-03-2018

The case shows that the synthesis tool is pretty good in optimizing arithmetic code so that functionally equivalent designs usually end up with equal resource usage.

Regarding Trickys comment, I would prefer a more straightforward behavioral description of the up/down counter with fewer always blocks and without the auxiliary q_next.

Altera_Forum · ‎03-03-2018

We shouldn't question the OP's intentions: the OP may need the q_next to take some look-ahead decision.

Anyway; the original 28 LUTs can be reduced to 17 by rewriting the source code slightly:

module register8 (
            clk,
            clken,
            reset_n,
            data,
            load,
            cnt_en,
            updown,
            q,
            q_next
            );
input clk;
input clken;
input reset_n;
input  data;
input load;
input cnt_en;
input updown;
output  q, q_next;
reg  addsub;
always @(*)
    if ( updown )
        addsub =  - 8'd1;
    else
        addsub = 8'd1;
// asynchronous 'next' value. This is used extensively for flags 
reg  q, q_next;
always @(*)
    q_next = q + addsub;
always @(posedge clk or negedge reset_n )
    if ( !reset_n )
        q <= 8'd0;
    else if ( clken & (load | cnt_en))
        if (load )
            q <= data;
        else
            q <= q_next;
endmodule

This produces an (almost?) beautiful RTL schematic:

https://alteraforum.com/forum/attachment.php?attachmentid=14923&stc=1

Oops, I did change the functionality after all ...

Here is the corrected source code:

module register8 (            clk,
            clken,
            reset_n,
            data,
            load,
            cnt_en,
            updown,
            q,
            q_next
            );
input clk;
input clken;
input reset_n;
input  data;
input load;
input cnt_en;
input updown;
output  q, q_next;
reg  addsub;
always @(*)
    if ( updown )
        addsub =  - 8'd1;
    else
        addsub = 8'd1;
// asynchronous 'next' value. This is used extensively for flags 
reg  q, q_next;
always @(*)
    if (load)
        q_next = data;
    else
        q_next = q + addsub;
always @(posedge clk or negedge reset_n )
    if ( !reset_n )
        q <= 8'd0;
    else if ( clken & cnt_en)
        q <= q_next;
endmodule

and the corresponding (beautiful!) RTL schematic:

https://alteraforum.com/forum/attachment.php?attachmentid=14924&stc=1

And still only 17 LEs!

But it is still different from the original ...

Three times is a charm ...

module register8 (            clk,
            clken,
            reset_n,
            data,
            load,
            cnt_en,
            updown,
            q,
            q_next
            );
input clk;
input clken;
input reset_n;
input  data;
input load;
input cnt_en;
input updown;
output  q, q_next;
reg  addsub;
always @(*)
    if ( cnt_en)
        if (updown )
            addsub =  - 8'd1;
        else
            addsub = 8'd1;
    else
        addsub = 0;
// asynchronous 'next' value. This is used extensively for flags 
reg  q, q_next;
always @(*)
    if (load)
        q_next = data;
    else
        q_next = q + addsub;
always @(posedge clk or negedge reset_n )
    if ( !reset_n )
        q <= 8'd0;
    else if ( clken )
        q <= q_next;
endmodule

https://alteraforum.com/forum/attachment.php?attachmentid=14925&stc=1

And, I am lucky! Still only 17 LE. I must admit this starts looking obscure, so I will not get any brownie points :)

Altera_Forum · ‎03-03-2018

Thanks guys!

WRT Tricky's comment, I generally try to stay away from lpm functions as they don't port over to different architectures. Also, I don't see anything 'clever' in my code. The reason I use q_next is that it allows me to calculate the negative / zero flags for an operation before the operation is clocked, thereby saving myself the hassle of trying to figure out which instruction modified the flags. There's no magic here, really.

That said, I wholeheartedly agree! I've seen some eye-watering code where I spent hours manually decoding bits to figure out what something did.

-Mux

Altera_Forum · ‎03-05-2018

--- Quote Start ---

WRT Tricky's comment, I generally try to stay away from lpm functions as they don't port over to different architectures. Also, I don't see anything 'clever' in my code. The reason I use q_next is that it allows me to calculate the negative / zero flags for an operation before the operation is clocked, thereby saving myself the hassle of trying to figure out which instruction modified the flags. There's no magic here, really.

--- Quote End ---

But you did realize that exposing q_next is exactly blowing up the design? Without it, you arrive at the lpm result.

--- Quote Start ---

And, I am lucky! Still only 17 LE. I must admit this starts looking obscure, so I will not get any brownie points.

--- Quote End ---

May be I'm overlooking something, but your third design seems functionally equivalent to the post# 1 code. So not completely clear how the different resource utilization is brought up. In any case, cutting q_next output brings you back to 9 LEs for all design variants.

Without q_next, I prefer a single always block description.

always @(posedge clk or negedge reset_n )
  if (!reset_n)
    q <= 8'd0;
  else if (clken)
  begin
    if (load)
      q <= data;
    else if (cnt_en)
    begin
      if (updown)
        q <= q - 1;
      else 
        q <= q + 1;
    end
  end

Altera_Forum · ‎03-05-2018

Frank,

Again it is not up to us to question the OP's intention: if the OP wants a q_next, so we should give him that, no? I think the OP is now painfully aware that there is a cost for this q_next option.

My third solution is indeed functionally equivalent to the OP's original post# 1. The difference is in the coding: I'm sure you can spot the difference, studying the RTL diagrams will help, I attach the RTL schematic for the original post.

https://www.alteraforum.com/forum/attachment.php?attachmentid=14930

I tried to put everything in 2 always blocks, one combinatorial, the other sequential, but that didn't work. I normally don't do Verilog, so I didn't persist.

Regards,

Josy

Altera_Forum · ‎03-05-2018

@josyb Yes, I'm aware there's a cost to exposing q_next and having to chose between the convenience of having the result exposed before the clock going high rather than after. Considering this module is used multiple times throughout my design I'm going to have to choose and see which where I want to take a hit. By exposing q_next I get to save additional states in my main state-machine, will has reduced the overall cost, so there's a ripple effect.. The target for my project is a MachXO2 1280 which is tight..

With regards to having multiple 'always' blocks... I've gone through a number of 'style' iterations and kinda settled on the current one. Maybe not the nicest way but I'm sure I'll eventually come down to a more compact one.

Thanks for the comments though! Highly appreciated!

-Mux