Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
1,826 Views

How to square a fixed point number?

Assuming I have a 16 bit fixed point number that needs to be squared, how should this be implemented? 

 

 

The number format is Q2.14. Squaring it shall produce a result that is 32 bits instead and seems to have format of Q4.28 (right?). Now since I am squaring this fixed point number, I shall have to shift it right to become Q4.14 and then feed it back in. But how can I feed in a result of format Q4.14 when the input is of format Q2.14. What is wrong in my understanding?
0 Kudos
13 Replies
Altera_Forum
Honored Contributor I
166 Views

 

--- Quote Start ---  

Assuming I have a 16 bit fixed point number that needs to be squared, how should this be implemented? 

 

 

The number format is Q2.14. Squaring it shall produce a result that is 32 bits instead and seems to have format of Q4.28 (right?). Now since I am squaring this fixed point number, I shall have to shift it right to become Q4.14 and then feed it back in. But how can I feed in a result of format Q4.14 when the input is of format Q2.14. What is wrong in my understanding? 

--- Quote End ---  

 

 

just multiply 16 bits x 16 bits = 32 bits 

 

If you are asked to also scale down then truncate LSBs
Altera_Forum
Honored Contributor I
166 Views

Are the values signed or unsigned? 

As you indicate that format is Q2.14 I guess they are signed and that the value is between -1.0 and 1.0 (both inclusive) 

In that case you can safely discard the two upper bits in the (scaled) Q4.14 and feed the remaining Q2.14 further.
Altera_Forum
Honored Contributor I
166 Views

 

--- Quote Start ---  

just multiply 16 bits x 16 bits = 32 bits 

 

If you are asked to also scale down then truncate LSBs 

--- Quote End ---  

 

 

Truncating always floors the values (ie. rounds down). 

If you need to round to nearest, you add 1 to the highest LSB you are going to discard, then discard the LSBs.
Altera_Forum
Honored Contributor I
166 Views

 

--- Quote Start ---  

I guess they are signed and that the value is between -1.0 and 1.0 

--- Quote End ---  

 

Why can we expect that the Q2.14 range (-2.0..+1.999) isn't utilized? Saturation logic should be used in the general case. 

 

Squaring is only a special case of fixed point multiply, by the way. It requires the same scaling operations. 

 

The simplest way to do the result scaling and saturation is to use IEEE fixed point package.
Altera_Forum
Honored Contributor I
166 Views

Has to be corrected: In usual fixed point notation, the sign is implied Q2.14 range is -4.0..+3.999 respectively. https://en.wikipedia.org/wiki/q_(number_format)

Altera_Forum
Honored Contributor I
166 Views

Thank you for your responses, I have been looking into this topic for a while and have a few more questions now. 

 

First, as mentioned by FvM, since the quantity is signed and the sign bit is implicit and not counted (the MSb) the format shall become signed Q2.13. Now if I square this, the result would be 32 bits and shall be written as Q4.26? This does not look write as it does not add to 31 but adds to 30 only. What is wrong? 

 

It is only today I have found that there are some packages for fixed point arithmatic within VHDL. However, it is not clear to me how old these are or how much support they have in synthesis so I shall have to post a new question. I have only used std_logic_1164 and did not know that there are packages created for supporting fixed point arithmatic.
Altera_Forum
Honored Contributor I
166 Views

Q2.14 is just Q2.14 with implied sign bit. I think the non-implied equivalent would be a 3.14 2's complement number (-4.0 to +3.999). 

 

Multiplying 2 Qm.n fixed point values give an answer of Q(2*m).(2*n). So your output would be Q4.28. The fixed point equivalent of this would be a 5.28 number (-16.0 to +15.999). 

 

In VHDL, I believe you can just infer things directly: 

 

signal left : signed(1 to 17); --Q2.14, 2's comp 3.14 

signal right : signed(1 to 17); --Q2.14, 2's comp 3.14 

signal ans : signed(1 to 33); --Q4.28, 2's comp 5.28 

 

ans <= left * right;
Altera_Forum
Honored Contributor I
166 Views

FvM you said: 

"The simplest way to do the result scaling and saturation is to use IEEE fixed point package." 

 

Are you talking about the fixed_pkg? Is it synthesizeable? How exactly does it help here? I mean if I am going to use a multiplier IP block from the IP catalog to do multiplication, the top level ports of it are std_logic_vector. So how would the fixed_pkg help in that?
Altera_Forum
Honored Contributor I
166 Views

 

--- Quote Start ---  

FvM you said: 

"The simplest way to do the result scaling and saturation is to use IEEE fixed point package." 

 

Are you talking about the fixed_pkg? Is it synthesizeable? How exactly does it help here? I mean if I am going to use a multiplier IP block from the IP catalog to do multiplication, the top level ports of it are std_logic_vector. So how would the fixed_pkg help in that? 

--- Quote End ---  

 

 

Yes it is synthesisable, if you download the '93 compatible version (for Quartus 15 and earlier)or it is built in with q15.1 prime pro. 

It allows you declare fixed point numbers in VHDL like this: 

 

signal my_fixed : sfixed(1 downto -14); -- 2 integer bits, 14 fraction bits 

 

The only advantage this gives you is that it easier to understand from the VHDL. The synthesised hardware will be identical to using numeric_std or instantiating the lpm_mult.  

It has conversion functions to std_logic_vector if you need to connect to the lpm_mult, or you can simply just write: 

 

my_new_fixed <= my_fixed * my_fixed2;
Altera_Forum
Honored Contributor I
166 Views

I see, what if I want to port map the sfixed to an lpm_mult or some other arithmatic block? I hope that can be done. I guess then it will not "automatically" tell me what the output is going to be like.  

Oh...wait...... I guess I can just calculate on paper what Q format the output will have and then portmap that in place of an std_logic_vector at the output. Is that correct? In this way, it will be easier to read the interpret the code (by humans). 

 

By the way, is writing my_fixed * my_fixed2 really a good idea since I am not sure if it will infer a DSP block or some other kind of multiplier, and if it does infer a DSP block then I do not know how much pipeline latency it will have e.t.c?? I think that one should instantiate the specific hardware that is required for the multiplier or divider or whatever function it is. I only use the + and - operators since a counter is relatively straight forward. 

 

Thanks Tricky and everyone :)
Altera_Forum
Honored Contributor I
166 Views

a <= b * c; 

will pretty much always infer a DSP block. The amount of pipelining is determined by you - by how many registers you place around the multiplier in your code. the multiplier above can be placed inside or outside a synchronous process. No VHDL function is pipelined. 

 

You can map the sfixed to std_logic_vector and back again with: 

 

signal port_input, port_output : std_logic_vector(7 downto 0); signal my_sfixed, sfixed_op : sfixed(3 downto -4); ..... port_input <= to_slv(my_sfixed); sfixed_op <= to_sfixed(port_output, sfixed_op'high, sfixed_op.low);  

 

You need to design the multiplier to be correct. But the fixed_pkg is purely a way to represent an integer in your code. It makes no difference to the underlying hardware
Altera_Forum
Honored Contributor I
166 Views

 

--- Quote Start ---  

a <= b * c; 

will pretty much always infer a DSP block. The amount of pipelining is determined by you - by how many registers you place around the multiplier in your code. the multiplier above can be placed inside or outside a synchronous process. No VHDL function is pipelined. 

--- Quote End ---  

 

 

That pipelining is going to be external to multiplier. With inference you can not decide internal pipe which could fail timing
Altera_Forum
Honored Contributor I
166 Views

 

--- Quote Start ---  

That pipelining is going to be external to multiplier. With inference you can not decide internal pipe which could fail timing 

--- Quote End ---  

 

 

You have to let the synthesizer/fitter do register retiming, and the synthesisor WILL infer internal pipe stages. 

There is a bug for stratix 4 that means that for larger multipliers (larger than 18x18, so multiple DSPs required) the output register is not inferred properly, and so an LPM mult block is required to use all the internal pipe stages, but the same bug is not present on Stratix V or Arria 10. 

 

This was an issue I came up against and confirmed it with Altera support.
Reply