Re: what is the different between * and IP core multiplier

Altera_Forum · ‎10-25-2011

hi ,everyone ,i have a question about the mulitplier.

i am a newer in FPGA ,now , i wonder the different between the mulitplier

constusted by * in HDL and the one generated by Ip core ,what is the most important different between these two kinds of multipliers? the speed or any others? thaks

Altera_Forum · ‎10-25-2011

I assume by IP core you mean "lpm_mult"?

Using lpm_mult you have more control over the multiplier unit to be target. Typing '*' into your code you are at the mercy of what the synthesis engine picks for you. So for portable coding '*' is a better approach but when you are tuning your design for performance or area you might need to resort to using lpm_mult. I recommend using '*' and when that doesn't give you the results you are looking for then try replacing it with an instantiation of lpm_mult.

Altera_Forum · ‎10-25-2011

thanks ,for your suggestion.

you mean that i can use the * for the primary design ,and if the synthesis result can't fit the target ,then i can usethe LPm_MULT instead for the second design to improve the function?

Altera_Forum · ‎10-25-2011

An example of switching to lpm_mult would be if you determine features of the DSP/embedded multiplier block are not being utilized when the multiplier is inferred when you were counting on it. Sometimes it is not possible for the synthesis engine to map all the features of the hardware multipliers so using lpm_mult gives you the ability to do this.

In the Quartus II handbook there is a chapter called something like "HDL coding guidelines". It probably does a better job explaining this under the section about multiplication.

I try to avoid using the LPMs whenever possible since I often create IP for different FPGA families and the hard block characteristics sometimes differ. So options of lpm_mult may vary between families which makes your implementation less portable as a result (you may not care about portability though).

Altera_Forum · ‎10-28-2011

This post is timely, because I have been having an issue related to it. For larger multipliers, lpm_mult creates logic that is much faster. In my case of a signed 32x32 multiply, lpm_mult is double the speed of using "*" in Verilog. For a reference, here is my code:


module mult_test(
	input						CLK,
	input	signed 	IN_A,
	input signed 	IN_B,
	output signed  OUT_C
);
	
	//Verilog version
	reg signed  IN_A_d1;
	reg signed  IN_B_d1;	
	reg signed  mult_result;
	
	assign mult_result = IN_A_d1 * IN_B_d1;
	
	always @(posedge CLK) begin
		IN_A_d1 <= IN_A;
		IN_B_d1 <= IN_B;
		OUT_C <= mult_result;
	end
	
	
	/*
	//Altera LPM Megafunction version
    //Created as signed 32x32 -> 64-bit multiply with 2 cycles of latency
	wire signed  mult_result;
	assign OUT_C = mult_result;
	
	megafunction_mult	megafunction_mult_inst (
		.clock (CLK),
		.dataa (IN_A),
		.datab (IN_B),
		.result (mult_result)
	);*/
endmodule

I get 90 MHz fmax with the SystemVerilog version, and 179 MHz with the lpm_mult version. I would rather use "*" for code portability, but the 50% speed cut is unbearable in my application.

Altera_Forum · ‎10-28-2011

I think it is to do with pipeline. You cannot pipeline internally with the inferred case as you did with lpm assuming a dedicated mult was generated in either case.

Altera_Forum · ‎10-28-2011

I should mention that I copied the code format from "Example 10–2. Verilog HDL Signed Multiplier with Input and Output Registers (Pipelining = 2)" in the Quartus II handbook.

Altera_Forum · ‎10-28-2011

To improve speed further, you better register io of mult block and also insert registers between the block and fabric.

Altera_Forum · ‎10-28-2011

code portability of slow problematic design is never a good idea and defeats its purpose... common sense ??

Altera_Forum · ‎10-31-2011

Try writing it the same way as shown in the multiplier template under the Edit menu. You can find it here under the templates: Verilog HDL --> Full Designs --> Arithmetic --> Multipliers --> Signed Multiply with Input and Output Registers.

Altera_Forum · ‎10-31-2011

--- Quote Start ---

Try writing it the same way as shown in the multiplier template under the Edit menu. You can find it here under the templates: Verilog HDL --> Full Designs --> Arithmetic --> Multipliers --> Signed Multiply with Input and Output Registers.

--- Quote End ---

I just tried that, and it gave me the same result as my own hand-written Verilog.

So, the current fmax summary:

lpm_mult with two cycle latency = 180 MHz

Verilog * operation with input and output registers = 90 MHz

Quartus II Verilog signed multiply with I/O registers template = 90 MHz

My .sdc file is setup to try for 200 MHz in every case.

Altera_Forum · ‎10-31-2011

By any chance is the module you are testing this with at the top level? If so I suspect your input and output registers are being packed into the I/O. Either assign those inputs and outputs to virtual pins using the assignment editor or just shove a bunch of pipeline stages in front and after the multiplication in your HDL file. This will make sure you'll iscolate the multiplier from the I/O. So in other words do this:

Register --> register --> register --> register --> multiply --> register --> register --> register --> register

If this causes your timing problems to go away then don't worry, you won't need that kind of pipelining once you feed the multiplication with on-chip inputs and outputs (and if you do that means the surrounding logic could use some pipelining).

Altera_Forum · ‎10-31-2011

--- Quote Start ---

By any chance is the module you are testing this with at the top level? If so I suspect your input and output registers are being packed into the I/O. Either assign those inputs and outputs to virtual pins using the assignment editor or just shove a bunch of pipeline stages in front and after the multiplication in your HDL file. This will make sure you'll iscolate the multiplier from the I/O. So in other words do this:

Register --> register --> register --> register --> multiply --> register --> register --> register --> register

If this causes your timing problems to go away then don't worry, you won't need that kind of pipelining once you feed the multiplication with on-chip inputs and outputs (and if you do that means the surrounding logic could use some pipelining).

--- Quote End ---

The full history is that I had a design with a signed 32x32 signed multiply working fine. That multiply was not at the top level.

Then, Quartus updated to 11.0 and that existing design suddenly failed timing by a huge margin. The failing paths were mostly through the 32x32 multiply. Fmax dropped from 100 MHz to 60 MHz. I tried rebuilding the design in 10.1 and it went back to 100 MHz. Long story short: Altera said they found the problem and it's a bug in 11.0.

They didn't give a workaround, though, so I've been experimenting in the hopes that I can get code that will always meet timing while still being portable. Sadly, no amount of extra pipelining has brought the design back to the Quartus 10.1 fmax in Quartus 11.0.

The fmax numbers from my last post were all based around top-level modules, though. I will try adding the extra registers like you mentioned to see if the numbers change.

Update: Adding three additional series registers to both inputs and three additional series registers to the output increased fmax for both "*" and the Quartus II template code, but only by about 4 MHz.

So now it's up to ~180 MHz for lpm_mult vs ~94 MHz for code.