Re: fixed-point tips

Altera_Forum · ‎04-16-2009

Hi

I have some experience with Quartus, but in the past have only worked on projects that implement very basic boolean expressions (like generating pulses with user defined widths and delays). I have been asked to try and build a block that does some stuff rather more complicated than I am used to with real numbers... but of course the final block has to be synthesizable. Here is an example of a "phase diff block" that is needed.

Inputs:

I1, I2, Q1, Q2, A1, A2 (all 16-bit with values lying between -1 and +1)

cMode (Boolean)

kc (all 16-bit with values lying between 0 and +1)

Outputs:

Pd (16-bit with values lying between -1 and +1)

LD (Boolean)

The function to implement is something like:

Pd = (I1 * Q2 - I2 * Q1) / (A1 * A2)

ld = (1-kc) * ld + kc * (I1 * I2 + Q1 * Q2) / (A1 * A2)

If cMode = '0' THEN LD = 0

ELSIF ld > 0 THEN LD = 1 ELSE LD = 0

To be honest I don't really know where to start with how to define the data types and the equation for Pd. Any tips most welcome.

Many thanks, Kurt

Altera_Forum · ‎04-17-2009

Verilog or VHDL

Altera_Forum · ‎04-17-2009

Oops, apologies... VHDL.

I've made a little progress. I've written a little function that multiplies two 16 bit unsigned integers, which results in a 32-bit unsigned integer. The function then just drops the 16 LSBs which results in the scaled answer. So this works fine for unsigned inputs whose 'real' value is =< 1. I don't know how general this is for values > 1... probably not. For signed values I suppose just a little extra logic is needed to test the sign bit and derive the appropriate sign bit for the result.

The next element I need to figure out is the division part. Assuming (A1*A2) (the divisor) is not a tidy factor of 2 then would inverting then multiplying work. Is there a simple approximation for 1/x using a Taylor expansion or some other simple way? I don't think it has to be too accurate.

Thanks in advance for any pointers, Kurt

Altera_Forum · ‎04-17-2009

you can try the lpm_divide MegaFunction.

Altera_Forum · ‎04-17-2009

Thanks, I'll take a look at that.

I've also found an interesting document that could well solve the problem by raising the level of abstraction somewhat:

http://www.eda-stds.org/vhdl-200x/vh.../dvcon2005.doc

At first glimpse it seems like just what is needed.

Bye for now, Kurt

Altera_Forum · ‎04-17-2009

link didn't come through, this looks like it:

http://www.eda-stds.org/vhdl-200x/vhdl-200x-ft/packages_old/dvcon2005.doc

Altera_Forum · ‎04-17-2009

The new VHDL floating point packages actually have all necessary means for typical DSP problems as in your example. But I used my own functions for scaling of products similar to the one you discussed above up to now. I didn't even try, if they synthesize with Quartus without problems. Basically, the method works for arbitrary fixed point and fractional numbers, if you use a variable shift factor in product scaling and additional saturation logic.

I show an example of a multiply with scaling and saturation for reference:

FUNCTION MUL_ASL (X1,X2: SIGNED; N: INTEGER) RETURN SIGNED IS
 VARIABLE P:SIGNED(X1'length+X2'length-1 downto 0);
 BEGIN
   P:=X1*X2;
   IF P(P'left-1 downto P'left-N) /= 0 AND P>=0 THEN
    P:= (others => '1');
    P(P'left-N):='0';
   ELSIF P(P'left-1 downto P'left-N) /= -1 AND P<0 THEN
    P:= (others => '0');
    P(P'left-N):='1';
   END IF;	
-- Result length equals X2
  RETURN P(P'left-N downto X1'length-N);
 END;

For fractional signed numbers with a -1..1 range, the shift factor has to be set to 1 to remove the double sign bit.

Fast parallel division as lpm_divide is consuming a lot of resources. It can't be avoided sometimes, but in many cases other solution can be found. For "slow" division applications, I generally use a serial divider, that needs one clock cycle per result bit. Another options is a 1/x look-up table.

Altera_Forum · ‎04-18-2009

--- Quote Start ---

I've made a little progress. I've written a little function that multiplies two 16 bit unsigned integers, which results in a 32-bit unsigned integer. The function then just drops the 16 LSBs which results in the scaled answer. So this works fine for unsigned inputs whose 'real' value is =< 1. I don't know how general this is for values > 1... probably not. For signed values I suppose just a little extra logic is needed to test the sign bit and derive the appropriate sign bit for the result.

--- Quote End ---

I am totally bewildered as I see various ideas of computations from the use of functions to float ...etc.

For most general applications you need to think fixed-point, this is "almost" always adequate:

1) scale your real values to nearest integers, represent on available resolution.

so a [-1.00000 to +1.00000] range must be scaled and rounded up until it fits well in your datawidth.

2)then do the computation as (choose signed or unsigned)

3)chop off result as required possibly with rounding.

You may truncate at the very end or at an intermediate stage depending on resource/speed issues. You also may need some work to truncate any unused MSBs.

Kurtar: you are almost there but why use unsigned functions when you can just instantiate signed hardware mutipliers directly without functions.

Mathematically: Step 1) above of scaling is cancelled out at step 3) of output LSB truncation. for example if you scale by 2^n then truncate off n bits you get in effect the actual value as if you used fractional inputs.

R => round(R*2^n) --scaled

R => R/2^n --truncated(divided)

edit:

as an example, assume your input value e.g. a coeff is .705 then you scale it

to .705 * 2^8 = 180

then you used 180 as input to a mult with input A:

result = 180 * A

then you truncate reslut by 8 bit:

result =180*A/256

so in effect you multiplied by .705

The use of float library is fairly recent and I don't know of any engineers using it. I don't advise beginners to go that way before learning fixed-point

Altera_Forum · ‎04-18-2009

The quoted posted is dealing with fixed point respective fractional number rather than float, as far as I understand. It's basically the same method that you suggested.

The said new VHDL standards have two different parts, fixed point and float. I didn't hear anyone suggesting float arithmetics in this thread.

Altera_Forum · ‎04-21-2009

fixed point is easy to deal with. Like integer arithmatic a 16bitx16 bit gives a 32 bit result, but now you havd 1.15 number. a 1.15 x 1.15 gives a 2.30 (2 bits magnitude, 30 bits fraction) result. remembering these rules makes using the ieee.numberic_std package possible because you can monitor which bits are magnitude, and which are fractional.

so the following numbers, with range -1 to +1

signal A, B : signed(15 downto 0); (1.15 number)

signal C : signed(31 downto 0); (2.30 result)

signal OP : signed(15 downto 0); (1.15 output (16 bits))

c := A * B; --2 magnitude bits, 30 fractional bits

OP := C(30 downto 15); --in range -1 to nearly 1

(this is only possible if -1x-1 is not a possible input. If it is, you need to use the range (31 downto 16), and your result will be a 2.14 result, but you will lose 1 bit of accuracy. a 1.15 number only represents -1 to nealy 1, and cannot represent +1 itself, so an extra bit of magnitude is required)

When multiplying any signed numbers, you will always have 2 sign bits as the MSBs unless you multiply the most negative number by itself, so assuming max neg x max neg is not possible, you can ignore the MSB and gain an extra bit of accuracy

But now, life is alot easier

All you need to do is use the new floatfixpkg from the IEEE:

http://www.vhdl.org/fphdl/vhdl.html

from here you can do stuff like this:


signal A,B : sfixed(1 downto -14)
signal C : sfixed(3 downto -28)
signal D : sfixed(3 downto -12)
process(clk)
begin
  if rising_edge(clk)
    C <= A*B;
  end if
end process;
D <= c(d'range);

Division is always going to be a bummer. But if you remember that A/B is actually just A * 1/B, and you can generate 1/B in software, all division problems are removed.

Altera_Forum · ‎04-21-2009

Another thing I forgot to mention - it can get more complicated when you do something like using a range -0.5 to + 0.5

With this, you can actually gain accuracy because you can use "imaginary" bits. This comes in handy when you have 18 bit hardware multipliers but your values need to be in the range something like -(2^-18) to 2^-18 (ie. very small) where the most sig. 18 bits are all sign bits.

So you represent what would be a 36 bit number in 18 bits, with an "imagniary" 18 bits on the front. For full accuracy on the result, you'd need a 72 bit number - 36 imaginary sign bits and 36 bits of result.

Altera_Forum · ‎04-23-2009

Hi All

Many thanks for all your suggestions... being lazy I ended utilizing the recent IEEE fixed_pkg. For those interested, below is the code I ended-up with which seems to do the job OK (the functionality is a little different from my initial posting as the designer changed it):


LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.NUMERIC_STD.ALL;
 
LIBRARY WORK;
USE WORK.math_utility_pkg.ALL;
USE WORK.fixed_pkg.ALL;
 
ENTITY phase_diff IS
     PORT (
          clk : IN STD_LOGIC;
 
          i_I1 : IN STD_LOGIC_VECTOR (15 DOWNTO 0);
          i_Q1 : IN STD_LOGIC_VECTOR (15 DOWNTO 0);
          i_I2 : IN STD_LOGIC_VECTOR (15 DOWNTO 0);
          i_Q2 : IN STD_LOGIC_VECTOR (15 DOWNTO 0);
          i_A1 : IN STD_LOGIC_VECTOR (15 DOWNTO 0);
          i_A2 : IN STD_LOGIC_VECTOR (15 DOWNTO 0);
 
          i_cMode : IN STD_LOGIC;
          i_kc : IN STD_LOGIC_VECTOR (31 DOWNTO 0);
 
          o_phase_corr : OUT STD_LOGIC_VECTOR (31 DOWNTO 0);
          o_lock_detect : OUT STD_LOGIC;
 
          recipr : OUT STD_LOGIC_VECTOR (34 DOWNTO 0);
          a1a2 : OUT STD_LOGIC_VECTOR (33 DOWNTO 0)
          );
END ENTITY phase_diff ;
ARCHITECTURE arch OF phase_diff IS
          SIGNAL s_phase_corr : SFIXED (0 DOWNTO -31);
          SIGNAL s_phase_corr_prev : SFIXED (0 DOWNTO -31);
          SIGNAL s_ld : STD_LOGIC;
          SIGNAL s_lock_fixed_prev : SFIXED (0 DOWNTO -15);
BEGIN
     PROCESS(clk)
 
          VARIABLE v_I1 : SFIXED (0 DOWNTO -15);
          VARIABLE v_Q1 : SFIXED (0 DOWNTO -15);
          VARIABLE v_I2 : SFIXED (0 DOWNTO -15);
          VARIABLE v_Q2 : SFIXED (0 DOWNTO -15);
          VARIABLE v_A1_u : UFIXED (0 DOWNTO -15);
          VARIABLE v_A2_u : UFIXED (0 DOWNTO -15);
          VARIABLE v_A1 : SFIXED (0 DOWNTO -16);
          VARIABLE v_A2 : SFIXED (0 DOWNTO -16);
          VARIABLE v_kc : SFIXED (0 DOWNTO -31);
          VARIABLE v_recip : SFIXED (33 DOWNTO -1);
          VARIABLE v_pd_s1 : SFIXED (2 DOWNTO -32);
          VARIABLE v_pd_fixed : SFIXED (36 DOWNTO -33);
          VARIABLE v_pld_s1 : SFIXED (2 DOWNTO -32);
          VARIABLE v_pld_fixed : SFIXED (0 DOWNTO -15);
          VARIABLE v_lock_s1 : SFIXED (2 DOWNTO -46);
          VARIABLE v_lock_s2 : SFIXED (1 DOWNTO -46);
          VARIABLE v_lock_fixed : SFIXED (0 DOWNTO -15);
 
     BEGIN
 
          IF RISING_EDGE(clk) THEN
 
               v_I1 := TO_SFIXED( i_I1, 0, -15 );
               v_Q1 := TO_SFIXED( i_Q1, 0, -15 );
               v_I2 := TO_SFIXED( i_I2, 0, -15 );
               v_Q2 := TO_SFIXED( i_Q2, 0, -15 );
 
               v_A1_u := TO_UFIXED( i_A1, 0, -15 );
               v_A2_u := TO_UFIXED( i_A2, 0, -15 );
 
               v_A1 := TO_SFIXED( v_A1_u );
               v_A2 := TO_SFIXED( v_A2_u );
               v_kc := TO_SFIXED( i_kc, 0, -31 );
 
               a1a2 <= TO_SLV( RESIZE( v_A1 * v_A2, (2 * v_A1'HIGH) + 1, 2 * v_A1'LOW) ); -- 1 DOWNTO -32
 
               v_recip := RESIZE( RECIPROCAL ( v_A1 * v_A2 ), (-2 * v_A1'LOW) + 1, -1 * ((2 * v_A1'HIGH) + 1) ); -- 33 DOWNTO -1
               recipr <= TO_SLV( v_recip );
 
               v_pd_s1 := RESIZE(( v_I1 * v_Q2 ) - ( v_I2 * v_Q1 ), (2 * v_A1'HIGH) + 2, 2 * v_A1'LOW );
               v_pd_fixed := v_pd_s1 * v_recip; -- (I1Q2-I2Q1)/(A1A2)
 
               v_pld_s1 := RESIZE(( v_I1 * v_I2 ) + ( v_Q1 * v_Q2 ), (2 * v_A1'HIGH) + 2, 2 * v_A1'LOW );
               v_pld_fixed := RESIZE( v_pld_s1 * v_recip, 0, -15 ); -- (I1I2+Q1Q2)/(A1A2)
 
               v_lock_s1 := ( 1 - v_kc ) * s_lock_fixed_prev;
               v_lock_s2 := v_kc * v_pld_fixed; 
               v_lock_fixed := RESIZE( v_lock_s1 + v_lock_s2, 0, -15); -- (1-kc)*lock+kc*pld
 
               s_lock_fixed_prev <= v_lock_fixed;
 
               IF ((i_cMode = '1') AND (v_lock_fixed > TO_SFIXED(0.5, 0, -15))) THEN
 
                    s_ld <= '1';
 
               ELSE
 
                    s_ld <= '0';
 
               END IF;
 
               IF (i_cMode = '0') THEN
 
                    s_phase_corr <= TO_SFIXED( 0, 0, -31 );
 
               ELSE
 
                    s_phase_corr <= RESIZE( s_phase_corr_prev + (v_kc * v_pd_fixed), 0, - 31 );
 
               END IF;
 
               s_phase_corr_prev <= s_phase_corr;
 
          END IF;
 
     END PROCESS;
     o_phase_corr <= TO_SLV( s_phase_corr );
     o_lock_detect <= s_ld;
END arch;

There is some extra stuff in there just to help me with decoding, which I will remark out for the final version, but it seems to be passing testing at the moment.

Many thanks, Kurt

Altera_Forum · ‎05-16-2009

--- Quote Start ---

fixed point is easy to deal with. Like integer arithmatic a 16bitx16 bit gives a 32 bit result, but now you havd 1.15 number. a 1.15 x 1.15 gives a 2.30 (2 bits magnitude, 30 bits fraction) result. remembering these rules makes using the ieee.numberic_std package possible because you can monitor which bits are magnitude, and which are fractional.

so the following numbers, with range -1 to +1

signal A, B : signed(15 downto 0); (1.15 number)

signal C : signed(31 downto 0); (2.30 result)

signal OP : signed(15 downto 0); (1.15 output (16 bits))

c := A * B; --2 magnitude bits, 30 fractional bits

OP := C(30 downto 15); --in range -1 to nearly 1

(this is only possible if -1x-1 is not a possible input. If it is, you need to use the range (31 downto 16), and your result will be a 2.14 result, but you will lose 1 bit of accuracy. a 1.15 number only represents -1 to nealy 1, and cannot represent +1 itself, so an extra bit of magnitude is required)

When multiplying any signed numbers, you will always have 2 sign bits as the MSBs unless you multiply the most negative number by itself, so assuming max neg x max neg is not possible, you can ignore the MSB and gain an extra bit of accuracy

But now, life is alot easier

All you need to do is use the new floatfixpkg from the IEEE:

http://www.vhdl.org/fphdl/vhdl.html

from here you can do stuff like this:


signal A,B : sfixed(1 downto -14)
signal C : sfixed(3 downto -28)
signal D : sfixed(3 downto -12)
process(clk)
begin
  if rising_edge(clk)
    C <= A*B;
  end if
end process;
D <= c(d'range);

Division is always going to be a bummer. But if you remember that A/B is actually just A * 1/B, and you can generate 1/B in software, all division problems are removed.

--- Quote End ---

I have a question regarding these libraries, do they work correctly if you were to use Altera's LPM modules like ALTMULT_ADD, ALTMULT_MAC, etc as their inputs?

Altera_Forum · ‎05-16-2009

A std_logic_vector can be assigned to an sfixed signal by a conversion function, if it represents a fixed point number. But what do you mean with work correctly? If e.g. an overflow has occured in a LPM module, that don't provide saturation, it can't be recognized from the bit_vector. So the question is mainly, if the LPM module works correctly.

Altera_Forum · ‎05-16-2009

--- Quote Start ---

A std_logic_vector can be assigned to an sfixed signal by a conversion function, if it represents a fixed point number. But what do you mean with work correctly? If e.g. an overflow has occured in a LPM module, that don't provide saturation, it can't be recognized from the bit_vector. So the question is mainly, if the LPM module works correctly.

--- Quote End ---

What I meant to ask is will the operations (multiply, add, etc) that a LPM/MegaFunction uses will give the correct results given an ufixed, or sfixed? Or just including the libraries will automatically use those functions for the fixed point types?

Altera_Forum · ‎05-18-2009

Using these libraries has nothing to do with the LPM library. the LPM libraries are a separate thing altogether. Using the fixed_point libraries should give you much more reaadable and portable code, as well as making simulations much faster. The synthesisor should take the code and place multiplers/adders for you. The LPM library does not use the sfixed or ufixed types, so no you cannot input them into an LPMMULT and make it work, you have to type convert the values to std_logic_vector (which really is meant to have no meaning other than a collection of bits)

So, where before you will have had:


mult : altmult_add
generic map (
  width_a  => 16,
  width_b  => 16,
  ......
)
port map (
  clk       => clk,
  dataa   => a,
  datab   => b,
  ...
  result   => p,
);

you can replace it with:


mult_proc : process(clk)
begin
  if rising_edge(clk) then
    p <= a*b;
  end if;
end process;

To be honest, this method has been around for years using the numeric_std package, just with the fixed package now it makes reading fixed point code that much easier.

Altera_Forum · ‎05-18-2009

The Altera MegaFunctions are also inferred, when using arithmetic operators in HDL code, in addition, there are synthesis attributes to specify e.g. priority for hardware or software multiplier. They can be used for the design as a whole, for design units or part of them.

So in most cases, you don't have to care for the usage of MegaFunctions respectively the optimal implementation of your code.

Altera_Forum · ‎05-18-2009

Thank you guys for your answers. I was just unclear on whether or not I could use these types (sfixed, ufixed) with the LPM functions, or MegaCore functions so I don't have to reinvent the wheel. Also I am required to use the special DSP blocks (Stratix chips) for the application I am trying to develop and I was not sure whether this library would work with it.

Altera_Forum · ‎05-19-2009

Have a look through this document for the recommended coding styles to infer DSP or memory or anything else:

http://www.altera.com/literature/hb/qts/qts_qii51007.pdf?gsa_pos=1&wt.oss_r=1&wt.oss=code%20templates

you should be able to replace signed/unsigned with ufixed and sfixed.