Q (number format) Multiply

Altera_Forum · ‎10-03-2012

I'm working on a project that uses a 32-bit accumulator, which it's input is a 32-bit step size (unsigned Q30.2) value. The accumulator can rollover when preforming a large number of iterations or working with a large step size. I need to be able to calculate the final value of the 32-bit accumulator overtime. The step size will be added to the accumulator every 8 ns. The number of iterations (8 ns increment) will be giving as a 32-bit clk_offset value.

Will the following code work?

amplitude = CONV_STD_LOGIC_VECTOR(CONV_INTEGER(CONV_INTEGER(intra_clk_offset) * CONV_INTEGER(step_size)), 32);

Since the step_size is a Q30.2 number, shouldn't I shift the value left?

Thanks,

Altera_Forum · ‎10-03-2012

Urgh. The problem of fixed point using non-fixed point libraries, and the use of non-standard legacy VHDL libraries. (and why o why does everything have to be a std_logic_vector - you can use integers and unsigned and any other type in ports you know)

Anyway, back to code. You are going to overflow big time. If box tr_clk_offset and step_size are 32 bits, the output needs to be 64 bits, not 32., with a Q62.2 output.

PS. Have a look into the new fixed point VHDL libraries (Quartus may one day support the real VHDL 2008 files, but until then you can easily use the VHDL93 versions avaiable from www.vhdl.org/fphdl)

it allows you to do stuff like this:


signal tr_clk_offset : ufixed(31 downto 0);
signal step_size : ufixed(29 downto -2); --Q30.2
signal amplitude : ufixed(61 downto -2);
amplitude <= tr_clk_offset * step_size;

Look - no hideous type converting (which is what you get when you insist everything has to be a std_logic_vector. Btw, if you decide to use this package, you'll have to forget about using std_logic_unsigned, std_logic_signed and std_logic_arith and use numeric_std instead (which is only a good thing).

Altera_Forum · ‎10-03-2012

PS. No, you dont need to shift step_size, because its really just an integer with an implied 2^-2 offset. This means the output is Q62.2

Altera_Forum · ‎10-03-2012

Thanks for your help.

I'll consider using the package, but I'll need to convert all the code over.

If I continue down the same path using std_logic_vector, I understand that the product will be a Q62.2 value. How would I scale that value to be 32 bits? Would I use the upper 32-bits?

temp = CONV_STD_LOGIC_VECTOR(CONV_INTEGER(CONV_INTEGER(in tra_clk_offset) * CONV_INTEGER(step_size)), 64);

amplitude = temp(64 downto 32);

I know "CONV_STD_LOGIC_VECTOR" will not work with a 64 bit value, just using this as an example.

Altera_Forum · ‎10-03-2012

--- Quote Start ---

If I continue down the same path using std_logic_vector, I understand that the product will be a Q62.2 value. How would I scale that value to be 32 bits? Would I use the upper 32-bits?

--- Quote End ---

It 'depends' :)

I'll explain why with a simplified example ...

If you have 3-bits, you can represent the signed values -4, -3, -2, -1, 0, 1, 2, 3. If you scale them by 4, you get the Q0.2 numbers -1.0, -0.75, -0.5, -0.25, 0, 0.25, 0.5. 0.75.

If you multiply two of these numbers together, the largest value you can get is -1.0 x -1.0 = +1.0, and the smallest is 0.25 x 0.25 = 0.0625, so your product needs to have a Q1.4 representation.

In general, Qm1.n1 x Qm2.n2 = Q(m1+m2+1).(n1+n2).

However, if you don't like asymmetry, or like wasting bits, you can use a slight variation on the numbering scheme where you eliminate the most negative value, i.e., your signed-symmetric Q0.2 numbers are -0.75, -0.5, -0.25, 0, 0.25, 0.5. 0.75 (any -1.0 values in an input data stream are replaced with -0.75 values - a quantization error no worse than the one that happens to positive values). With this numeric representation at the input to your multiplier, you never get the product 1.0, so you can eliminate that bit when you bit-slice to convert your Q1.4 product back into Q0.2 format (or whatever narrower bit-width you plan on using).

In that case, you'd keep the MSB (the sign bit), drop the next bit (the bit needed to represent +1.0), and then keep the remaining bits you want to keep, and eliminate the LSBs.

Depending on your application, you might actually want to round the result before discarding the LSBs ... but thats another discussion.

Cheers,

Dave

Altera_Forum · ‎10-04-2012

--- Quote Start ---

I know "CONV_STD_LOGIC_VECTOR" will not work with a 64 bit value, just using this as an example.

--- Quote End ---

This is only because of the limitations of the integer type. If you kept it all in the unsiged (or even better for this application, the ufixed type) there are no limitiations.

Basically, to do what you want, you cannot use integers at all, because they are so large. You will get an overflow error when you try and similate because the result of multiplying the two integers is too large.