Re: Math calculations in VHDL

Altera_Forum · ‎10-17-2012

Hi all,

I need a little help with my design, there are a few issues that I'm trying to resolve.

My design flow is as following: A signal is sampled by a 16bit A2D, multiplied by 2 another 16bit signals (which results in 2 32bit signals). After a few manipulations I need to implement a formula that uses those 2 signals with division, multiplication and a few other functions that are calculated using LUTs. The LUTs are designed for a signal normalized to the range of 0 - pi/2. Note that after the formula the signals will be converted to 32bit floating-point signals.

The formula requires some fractional calculations so I'm trying to use the fixed point package by David Bishop but I'm facing a few issues:

1. How am I supposed to convert the 32bit signals (integer) to 32 fixed point numbers?

2. I'm trying to divide the 2 32bit signals - what is the best way to do this? I tried the lpm_divide megafunction but I get a quotient and a remain while I need the whole number (i.e. 5.65 and not only 5).

3. How exactly am I supposed to use the fixed-point package? I have the following libraries, is that enough?

Library ieee;

use ieee.std_logic_1164.all;

use ieee.std_logic_unsigned.all;

use ieee.fixed_float_types.all;

use ieee.fixed_pkg.all;

Thanks in advance, help will be very appreciated.

Altera_Forum · ‎10-17-2012

Fixed point unlike the floating point arithmetics does not need any package at all. Every math operation implemented for integers is applicable to fixed point numbers.

This might be helpful http://www.digitalsignallabs.com/fp.pdf

As for the division this might be implemented using CORDIC algorithms.

Altera_Forum · ‎10-17-2012

I'm sorry if I wasn't clear enough, I understand the theory, my problem is with the implementation.

First of all, how can I get the whole number from the lpm_divide function (including the fractional part).

Second how can I scale the 32bit integers from the first part of my design so that they can be used as 32bit fixed-point fractions in the second part.

Altera_Forum · ‎10-17-2012

well, if its an integer, it has no fractional part to it. An integer is just a fixed point number with no fractional part. Unless you are talking about an integer you really meant was a fractional number and you biassed it by 2^n (where N is the number of fractional bits). In which case you need to tell us what the value of N is. You DONT need an LPM divide for this.

Answer to questions

1. You zero pad the inputs to the correct number of bits (integer and fractional) and the result will be correct.

2. You need to just assign it to the correct sized ufixed or sfixed values (fixed point is just integers, offset by 2^n)

either way, there are the to_ufixed and to_sfixed functions from the fixed point package.

one point to note. You should not be using the std_logic_unsigned package, its not an IEEE standard. If you have a number, you shouldnt make it a std_logic_vector. There is no need to make ports std_logic_vector, they can be anything.

Altera_Forum · ‎10-17-2012

Hi Tricky, thank you for your reply.

The reason I said that there are fractions is because when I divide the 2 32bit integers I will probably get a fraction.

You mean that I should convert the 32bit integers to ufixed and then divide them and store the answer in a 64bit ufixed signal? I tried to do that but I got a strange error regarding the "/" operator, that's why I used the lpm_divide megafunction instead, which works for std_logic_vector.

If I'm not using the std_logic_unsigned package I get an error even for a "+" sign, where am I wrong? Have I included all the correct files from the fixed-point package?

Altera_Forum · ‎10-17-2012

Yes, you will get a fraction. But if they start out as 32 bit, you will need a lot more bits to store the result. But making it a 32.32 sfixed is rather easy (remember integers are signed. For unsigned, you can only have a max of 31 bit integers). Do you really mean integers as in the VHDL type integer, or just a number that arrives via a 32 bit bus?

signal my_number : sfixed(31 downto -32);

my_number <= to_sfixed(input, 31, -32);

And this is free in terms of synthesis because you're really just appending a load of zeros to it. Then you can convert this to a std_logic_vector (with the to_slv function) so you can connect it into the lpm divide. No, you should not use the "/" function unless you can get away with a pipeline length of 1 (until altera sort out their register placement inside infered dividers properly!)

you shouldnt do + with std_logic_vectors, you should keep them in ufixed or sfixed type when you do this.

I think at this point you need to post some code to show us what you're actually trying to do.

Altera_Forum · ‎10-18-2012

Below is a simple code that uses the lpm_divide megafunction to divide 2 32bit numbers. I tried to implement your comments from the previous post. The numbers are 32bit std_logic_vectors (I refered to them as integers i.e. a number that arrives via a 32bit bus). I want to get the full result (including the fractional part) because the next step will be performing modulus pi/2 on the result. I'm getting the following compilation error: Error (10344): VHDL expression error at PreNICOMFunction_fixed.vhd(: expression has 0 elements, but must have 64 elements (it refers to the line I_fp <= to_ufixed(I,31,-32);)

1. What am I doing wrong?

2. How can I convert the divider result to a fractional number?

Thanks again for your help.

Library ieee;

use ieee.std_logic_1164.all;

use ieee.numeric_std.all;

use ieee.fixed_float_types.all;

use ieee.fixed_pkg.all;

entity PreNICOMFunction_fixed is

port (

clock : in std_logic;

reset : in std_logic;

I : in std_logic_vector (31 downto 0);

Q : in std_logic_vector (31 downto 0);

V0 : in std_logic_vector (31 downto 0) --;

-- temp_res : out ufixed (31 downto -32)

);

end PreNICOMFunction_fixed;

architecture rtl of PreNICOMFunction_fixed is

component lpm_divider1 IS --32 clocks latency

PORT (

aclr : IN STD_LOGIC ;

clock : IN STD_LOGIC ;

denom : IN STD_LOGIC_VECTOR (63 DOWNTO 0);

numer : IN STD_LOGIC_VECTOR (63 DOWNTO 0);

quotient : OUT STD_LOGIC_VECTOR (63 DOWNTO 0);

remain : OUT STD_LOGIC_VECTOR (63 DOWNTO 0)

);

END component lpm_divider1;

type fsm_type1 is (s0, s1, s2, s3, s4);

signal fsm1 : fsm_type1 := s0;

signal QdivI : std_logic_vector (63 downto 0) := (others => '0');

signal QdivI_remain : std_logic_vector (63 downto 0) := (others => '0');

signal I_ff : std_logic_vector (31 downto 0) := (others => '0');

signal Q_ff : std_logic_vector (31 downto 0) := (others => '0');

signal I_fp : ufixed (31 downto -32) := (others => '0');

signal Q_fp : ufixed (31 downto -32) := (others => '0');

signal div_res : ufixed (31 downto -32) := (others => '0');

signal cntr1 : ufixed (4 downto 0) := (others => '0');

constant div_latency : ufixed (4 downto 0) := "11111";

begin

divider : component lpm_divider1

port map (

aclr => '0',

clock => clock,

denom => to_slv(I_fp),

numer => to_slv(Q_fp),

quotient => QdivI,

remain => QdivI_remain

);

main : process (clock)

begin

if (rising_edge(clock)) then

I_ff <= I;

Q_ff <= Q;

case fsm1 is

when s0 => if ((I_ff /= I) or (Q_ff /= Q)) then --New data availabe

I_fp <= to_ufixed(I,31,-32);

Q_fp <= to_ufixed(Q,31,-32);

fsm1 <= s1;

end if;

-- atanLUT_clk <= '0';

when s1 => if (cntr1 < div_latency) then --Wait for the division operation to complete (Q/I)

cntr1 <= cntr1 + 1;

else

cntr1 <= (others => '0');

fsm1 <= s0;

end if;

when others => fsm1 <= s0;

end case;

end if;

end process main;

end rtl;

Altera_Forum · ‎10-18-2012

I modified the code so it compiles now but I still get a quotient and a remain and not the whole result. Besides, if my input is std_logic_vector why convert it to ufixed and then back to std_logic_vector?

Library ieee;

use ieee.std_logic_1164.all;

use ieee.numeric_std.all;

use ieee.fixed_float_types.all;

use ieee.fixed_pkg.all;

entity PreNICOMFunction_fixed is

port (

clock : in std_logic;

reset : in std_logic;

I : in std_logic_vector (31 downto 0);

Q : in std_logic_vector (31 downto 0);

V0 : in std_logic_vector (31 downto 0) --;

-- temp_res : out ufixed (31 downto -32)

);

end PreNICOMFunction_fixed;

architecture rtl of PreNICOMFunction_fixed is

component lpm_divider1 IS --32 clocks latency

PORT (

aclr : IN STD_LOGIC ;

clock : IN STD_LOGIC ;

denom : IN STD_LOGIC_VECTOR (63 DOWNTO 0);

numer : IN STD_LOGIC_VECTOR (63 DOWNTO 0);

quotient : OUT STD_LOGIC_VECTOR (63 DOWNTO 0);

remain : OUT STD_LOGIC_VECTOR (63 DOWNTO 0)

);

END component lpm_divider1;

type fsm_type1 is (s0, s1, s2, s3, s4);

signal fsm1 : fsm_type1 := s0;

signal QdivI : std_logic_vector (63 downto 0) := (others => '0');

signal QdivI_remain : std_logic_vector (63 downto 0) := (others => '0');

signal I_ff : std_logic_vector (31 downto 0) := (others => '0');

signal Q_ff : std_logic_vector (31 downto 0) := (others => '0');

signal I_fp : ufixed (31 downto -32) := (others => '0');

signal Q_fp : ufixed (31 downto -32) := (others => '0');

signal div_res : ufixed (31 downto -32) := (others => '0');

signal cntr1 : ufixed (4 downto 0) := (others => '0');

constant div_latency : ufixed (4 downto 0) := "11111";

begin

divider : component lpm_divider1

port map (

aclr => '0',

clock => clock,

denom => to_slv(I_fp),

numer => to_slv(Q_fp),

quotient => QdivI,

remain => QdivI_remain

);

main : process (clock)

begin

if (rising_edge(clock)) then

I_ff <= I;

Q_ff <= Q;

case fsm1 is

when s0 => if ((I_ff /= I) or (Q_ff /= Q)) then --New data availabe

I_fp <= to_ufixed(unsigned(I),31,-32);

Q_fp <= to_ufixed(unsigned(Q),31,-32);

fsm1 <= s1;

end if;

when s1 => if (cntr1 < div_latency) then --Wait for the division operation to complete (Q/I)

cntr1 <= resize(cntr1 + 1, cntr1'high, cntr1'low);

else

cntr1 <= (others => '0');

fsm1 <= s0;

end if;

when others => fsm1 <= s0;

end case;

end if;

end process main;

end rtl;

Altera_Forum · ‎10-18-2012

You will always get a remainder because you would otherwise need more bits to get and even smaller result in the quotient, which is not possible with lpm divide (64 bits being the max).

You dont really need to use sfixed, but you still need to append all the '0's onto the end when connecting it to the lpm_divide.

And I notice you dont have any outputs from this block - are you going to add them later?

Altera_Forum · ‎10-18-2012

Yes, the block is going to be more complicated. After the division I need to perform modulus pi/2 on the result and then use it with an LUT which is based on fractional numbers, that's why I need the fractional result. How can I find it?

Altera_Forum · ‎10-18-2012

the mod pi/2 is going to be hard, because that is essentially another division, you may be better off using a LUT for that too.

How can you find what?

Altera_Forum · ‎10-18-2012

How can I find the whole result of the division. For example if I calculate 5/3 I need the result to be 1.666 and not 1 with a remain of 2.

I found here (http://www.vhdl.org/fphdl/fixed_ug.pdf) all the functions that I need, including mod and /, is there a way I can use them (actually I thought that I can do that by including use ieee.fixed_float_types.all; use ieee.fixed_pkg.all;)

Altera_Forum · ‎10-18-2012

you can use them, but like I said you cannot pipeline them, which will lead to an extremely low fmax.

In Fixed point your results are limited by the number of bits you have. So you cannot store irrational numbers and you will only get within 2^-f (where f is the number of fraction bits) of the actual result.

It sounds like you are more a software guy and not a hardware guy. Do you really need 32 bit integers? do you really need a 64 bit result? these numbers are getting very big and going to chew up a large chunk of your FPGA. have you analysed your algorithm to see if you really need all of the bits? can you cut down on accuracy anywhere?

Altera_Forum · ‎10-18-2012

Oh, and a 64bit LUT is going to be impossible inside an FPGA, it would require a 2^64 memory locations, which is rather large! the biggest LUT you could implement in the largest FPGAs is about 22 bits.

Altera_Forum · ‎10-18-2012

Actually I'm a board design guy :)

I understand the meaning of fixed point 32bit numbers and it's fine for my purpose.

The LUTs and calculations that I need will work with 1.31 fractional numbers. I need 64bit only for the division result, after the modulus operation I will use 32bit numbers.

I need to implement it first and then test it, possibly I'll be able to reduce the number of bits but for now I want to use 32bits. So back to my question, what will be the best way to do that?

Altera_Forum · ‎10-18-2012

5 over 3 in binary 101 / 011 is 001 remainder 010 when you have 3 bit values

with 6 bit: 101.000 / 011 the result would be 1.101 = 1.625

Altera_Forum · ‎10-18-2012

My LUTs are going to be stored on external memories and they will be much smaller than 64bit, I'll need to use the "round" function in order to use the LUTs.

I understand your division example (actually that's exactly what I need) but how can I implement it? I don't get a x.y result when using lpm_divide.

Altera_Forum · ‎10-18-2012

a 32 bit LUT needs 4GWords of memory, that is a HUGE amount of memory for a lut, and seems rather pointless. Do you really need all that accuracy?

for the LPM divide, you need to keep track of the separation of the bits yourself. remember a 32.32 / 0.32 number would give a 64.32 result, wider than the limits of LPM divide. You need to zero pad to the right for the numerator and zero pad to the left for the denominator.

Altera_Forum · ‎10-18-2012

My LUTs are rather small, the biggest one has less then 4000 values. I can't exactly say how much accurate the calculations should be until I test the whole system but they should be pretty accurate as I'm dealing with a very tricky signal.

I got the division part, thank you very much. Any tips about the modulus operation?

Altera_Forum · ‎10-18-2012

divide, truncate, mutiply, subtract.

In this case, you should be able to replace the division with a multply by 2/pi.

Altera_Forum · ‎10-21-2012

Got it, thank you.