Fixed Point Rounding

Altera_Forum · ‎12-31-2011

Hey guys,

I have one more question about fixed point rounding. If you multiply two numbers and have to perform rounding, how would you do that with fixed point? I'm currently rounding a 32 bit product down to 16 bits by truncating the first 15 bits, thus performing a division by 2^15. But, if you have four bits representing the fractional part, how do you perform the rounding?

Altera_Forum · ‎12-31-2011

There are few rounding algorithms:

direct truncation (same as floor of matlab)

basic (same as nearest of matlab)

dc unbiassed (convergent of matlab)

round to nearest integer (round of matlab)

I normally go for basic: very simple:

to round 32 bits to 16 bits: add msb of truncated section (bit index 15)

to the 16 bits(b31:b16). thats all but be careful about saturation.

Altera_Forum · ‎12-31-2011

--- Quote Start ---

Hey guys,

I have one more question about fixed point rounding. If you multiply two numbers and have to perform rounding, how would you do that with fixed point? I'm currently rounding a 32 bit product down to 16 bits by truncating the first 15 bits, thus performing a division by 2^15. But, if you have four bits representing the fractional part, how do you perform the rounding?

--- Quote End ---

There's examples of different rounding functions from MATLAB on p41 of this document (where p41 is on the page of the document, its p43 of the PDF when read in acrobat);

http://www.ovro.caltech.edu/~dwh/correlator/pdf/esc-100slides_hawkins.pdf

In general, you want to use convergent rounding (round to nearest even).

There's additional details in here:

http://www.ovro.caltech.edu/~dwh/correlator/pdf/esc-100paper_hawkins.pdf

and source code for convergent rounding in here:

http://www.ovro.caltech.edu/~dwh/correlator/pdf/esc2011_fpga_dsp_code.zip

Cheers,

Dave

Altera_Forum · ‎12-31-2011

Thanks for the reply Kaz,

But I'm concerned about the effect of rounding on the fractional part of the number. I used a similar algorithm to the basic one that you just specifed, and it works great. But, that was with plain integers. I now have to extend the system to include four fractional bits. I am just wondering if there is anything I have to do accomodate this change when I do the rounding. I know that multiplications and divisions have to be modified, and I know that addition and subtraction can just remain the same, as long as the location of the decimal point remains fixed. I just don't know about the rounding. I may just be making it more complicated than it really is.

Altera_Forum · ‎12-31-2011

Not sure what you mean but if you have to remove bits then you do that irrespective whether you call them fractional bits or else. The best you can do is round the fraction to integer.

Altera_Forum · ‎12-31-2011

--- Quote Start ---

But I'm concerned about the effect of rounding on the fractional part of the number .... I now have to extend the system to include four fractional bits.

--- Quote End ---

Conceptually move your binary point to below the four fractional bits (scale by 2^4) and treat the numbers as integers.

Cheers,

Dave

Altera_Forum · ‎01-03-2012

Personally I always use the following technique. If I want to round 16-bits into 6 - bits I do the following.

XXXX_XXXX_XXXX_XXXX

+ 0000_0010_0000_0000

Then truncate the 6 msb's this will be the rounded value. I would not make it more difficult than that. In essence you are adding 0.5 and then truncating in decimal. It does not matter how many bits you have on the right hand side of the decimal point.

Hope that helped

/Boris

Altera_Forum · ‎01-03-2012

adding 1 as constant is wrong for 2's complement rounding.

You are asking for a large unnecessary adder. Just add MSB of truncated part. That is what I already explained.

Altera_Forum · ‎01-03-2012

Adding MSB of the truncated part has the same effect as what I'm doing. Yes you do not need that long of an adder as I did, I just was showing it for clarity. If you are using the MSB of the truncated part you need an if statement or an adder with a non constant however.

XXXX_XXXX_XXXX_XXXX

truncate to 6+1 bit

XXXX_XXX

+0000_001

Throw away the LSB

Versus

XXXX_XXXX_XXXX_XXXX

truncate 6 MSBs

XXXX_XX

+0000_0(MSB of truncated part)

Use the whole result.

Don't know which one would optimize better. Either one is probably about the same.

/Boris

Altera_Forum · ‎01-03-2012

Actually adding 1 works for 2's compliment rounding as well. check it out

Altera_Forum · ‎01-03-2012

I think we are referring to two different but equivalent methods:

1) add MSB of truncated part to the rest then truncate

2) add 1 to the rest+ 1 bit then truncate.

Yes both ways work.

Altera_Forum · ‎01-03-2012

Thank you kaz for showing me another way of rounding.

/Boris