Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20645 Discussions

Floating point addition in vhdl

Altera_Forum
Honored Contributor II
1,228 Views

Hii 

I want to add 4 floating point no.what will be the result range.I am using  

library ieee_proposed; 

use ieee_proposed.fixed_pkg.all; 

this package.For example if I want to add two floating point no then the result range is  

C= A+ B -- range of C is (max(A'right ,B'right)+1 downto min (A'left ,B'left)) 

and it is working fine .But when I am doing  

C = A+B+D+E; 

What will be the new C range. 

If I am following the same as above I am getting an error 

 

"D:/214ee1411/vhdl codes/floatarith/mul_1test/fmul_1.vhd" Line 50: Expression has 11 elements ; expected 9 

Please help me with this. 

Thank you
0 Kudos
11 Replies
Altera_Forum
Honored Contributor II
449 Views

You are using fixed point? 

 

use ieee_proposed.fixed_pkg.all;
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

 

--- Quote Start ---  

You are using fixed point? 

 

use ieee_proposed.fixed_pkg.all; 

--- Quote End ---  

 

 

Thank you for your reply. Yes I am using fixed point.For example I want to add 

 

s1 <=((to_ufixed (50,7,0))+(to_ufixed (50,7,0))+(to_ufixed (50,7,0))+ (to_ufixed (50,7,0))); 

 

Here is there any formulation to calculate the range of s1?
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

Yes,  

 

Please see table at  

 

http://electro-logic.blogspot.it/2015/11/fpga-numeri-virgola-fissa-seconda-parte.html 

 

where there are sizing rules
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

 

--- Quote Start ---  

Yes,  

 

Please see table at  

 

http://electro-logic.blogspot.it/2015/11/fpga-numeri-virgola-fissa-seconda-parte.html 

 

where there are sizing rules 

--- Quote End ---  

 

 

Thank you for the link.I am having this table.Here it is given for A and B only.My question is for adding 4 times..as if I am taking the s1 range (8 downto 0) as per table.It is showing error which I have posted above. 

When I an going for s1 range (10 downto 0 ) it is working fine.So my question is how to decide this range if I want to add multiple times.
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

Just break it down in to multiple 2 input additions. There are 3 additions, so the output needs to be 3 bits bigger than the largest input.

0 Kudos
Altera_Forum
Honored Contributor II
449 Views

 

--- Quote Start ---  

Just break it down in to multiple 2 input additions. There are 3 additions, so the output needs to be 3 bits bigger than the largest input. 

--- Quote End ---  

 

 

Thank you all
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

 

--- Quote Start ---  

Just break it down in to multiple 2 input additions. There are 3 additions, so the output needs to be 3 bits bigger than the largest input. 

--- Quote End ---  

 

 

Excuse me for being a bit of a bore, but if you add up 4 values, an extension with 2 bits suffices ...
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

 

--- Quote Start ---  

Excuse me for being a bit of a bore, but if you add up 4 values, an extension with 2 bits suffices ... 

--- Quote End ---  

 

 

Algorithmically yes - but not syntactically. Each addition in the fixed point library increases the bit size.
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

 

--- Quote Start ---  

Algorithmically yes - but not syntactically. Each addition in the fixed point library increases the bit size. 

--- Quote End ---  

 

 

I haven't used fixed-point types yet. 

But if we do it as you said: break it down into multiple 2 input additions: result := (a + b) + (c + d) 

2 bits extra will suffice.
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

Yes, now you've added brackets, it will only need 2 extra bits. 

But a+b+c+d will break down into: 

 

result := (((a+b) + c) + d);  

 

which does require 3 extra bits.
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

 

--- Quote Start ---  

Yes, now you've added brackets, it will only need 2 extra bits. 

But a+b+c+d will break down into: 

 

result := (((a+b) + c) + d);  

 

which does require 3 extra bits. 

--- Quote End ---  

 

 

I concur. 

But getting pedantic: each addition will generate an extra bit, so if n additions are concatenated (as you describe) the final addition delivers n additional bit. But the result only needs integer(ceil(log2(n))) extra bits so you can resize: 

signal a, b, c, d : ufixed(HH downto LL) ; constant NN : positive := 4; signal result : ufixed( HH + integer(ceil(log2(NN))) downto LL); result := resize((((a+b) + c) + d), HH + integer(ceil(log2(NN))), LL) ;
0 Kudos
Reply