Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21592 Discussions

Counting the position of ones in a binary vector

Altera_Forum
Honored Contributor II
2,914 Views

I want to design a combinatorial circuit for an fpga which will count the position of ones in a 16 bit vector. 

eg1 if vector is 16'b0000_0000_0011_0101 then output should be 

64'h0000_0000_0000_5420 

 

eg2 if vector is 16'b0000_1010_0000_1010 then output should be 

64'h0000_0000_0000_b931. 

 

Please help. 

Thanks in advance....
0 Kudos
22 Replies
Altera_Forum
Honored Contributor II
1,799 Views

This will not be straightforward to realise in a combinatorial circuit because what you describe is a rather sequential process: traverse the input vector from right to left, if you encounter a bit set to '1' add its position index into the next available output slot. 

This doesn't mean it can't be done combinatorially, but the result will be huge as it has to cater for all combinations (all 65356 of them ...).
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

only as guide, not tested 

process(data_in) begin for i in 0 to 15 loop if data_in(i) = '0' then data_out(i*4:i*4+3) <= "0000"; else data_out(i*4:i*4+3) <= std_logic_vector(to_unsigned(i,4)); end if; end loop; end process;
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

only as guide, not tested 

process(data_in) begin for i in 0 to 15 loop if data_in(i) = '0' then data_out(i*4:i*4+3) <= "0000"; else data_out(i*4:i*4+3) <= std_logic_vector(to_unsigned(i,4)); end if; end loop; end process;  

--- Quote End ---  

 

 

That doesn't do what ganeshmirajkar put forward, as e.g. bitposition 3 will always fill something into output nibble 3. So we need at least a variable. Steeling your code snippet: 

process(data_in) variable j : natural ; begin j := 0 ; data_out <= (others => '0') ; for i in 0 to 15 loop if data_in(i) = '1' then data_out(j*4:j*4+3) <= std_logic_vector(to_unsigned(i,4)); j := j + 1 ; end if; end loop; end process; 

I wonder how this compiles.
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

the loop index is going up anyway. for bit index 3 then output nibble = 3 if it is '1' else nibble = 0

0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

the loop index is going up anyway. for bit index 3 then output nibble = 3 if it is '1' else nibble = 0 

--- Quote End ---  

 

 

e.g. for bit 3, i == 3 and you will fill in nibble 3, but if this was the first bit (set to '1') encountered nibble 0 should get value 3. I don't think your code is doing that. 

Anyway I compiled my proposition, and it is actually not too bad, The compiler generated 477 LCs for a Cyclone II device. I still (would/will) have to simulate it.
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

And simulated. 466 LC, 80 registers, 105 MHz in a Cyclone II EP2C5F256C6. 

library ieee; use ieee.std_logic_1164.all; use IEEE.numeric_std.all; entity bitpos is port ( Clk : in std_logic ; data_in : in std_logic_vector(15 downto 0) ; data_out : out std_logic_vector(63 downto 0) ); end bitpos ; architecture a of bitpos is signal ld : std_logic_vector(15 downto 0) ; begin process(clk) variable j : natural ; variable lq : std_logic_vector(63 downto 0) ; begin if rising_edge(Clk ) then ld <= data_in ; j := 0 ; lq := (others => '0') ; for i in 0 to 15 loop if ld(i) = '1' then lq(j*4+3 downto j*4) := std_logic_vector(to_unsigned(i,4)); j := j + 1 ; end if; end loop; data_out <= lq ; end if ; end process; end a ; 

 

It uses a lot less resources than I initially guessed. Just showing how awesome powerful VHDL is. (Verilog very, very probably too)
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

e.g. for bit 3, i == 3 and you will fill in nibble 3, but if this was the first bit (set to '1') encountered nibble 0 should get value 3. I don't think your code is doing that. 

Anyway I compiled my proposition, and it is actually not too bad, The compiler generated 477 LCs for a Cyclone II device. I still (would/will) have to simulate it. 

--- Quote End ---  

 

 

if bit index == 3 then it equals 3, how then it equals something else? it is physically decided at input declaration.
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

This will not be straightforward to realise in a combinatorial circuit because what you describe is a rather sequential process: traverse the input vector from right to left, if you encounter a bit set to '1' add its position index into the next available output slot. 

This doesn't mean it can't be done combinatorially, but the result will be huge as it has to cater for all combinations (all 65356 of them ...). 

--- Quote End ---  

 

 

Actually it can be done in a combinatorial way but you dont take a case statement approach... i think there is a way... 

Actually i have managed to get a partial output. 

eg if input is 16'b0000_0000_0001_0101 then i have managed to get the output 64'h0000_0000_0004_0200... i dont know how to remove those extra zeros. In this case the expected output is 64'h0000_0000_0000_0420.
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

only as guide, not tested 

process(data_in) begin for i in 0 to 15 loop if data_in(i) = '0' then data_out(i*4:i*4+3) <= "0000"; else data_out(i*4:i*4+3) <= std_logic_vector(to_unsigned(i,4)); end if; end loop; end process;  

--- Quote End ---  

 

 

No for loops because i dont know what circuit will the synthesizer make out of that...
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

Thanks for your replies..people.

0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

Actually it can be done in a combinatorial way but you dont take a case statement approach... i think there is a way... 

--- Quote End ---  

 

Case statements would only inflate the code. josyb has shown an exact solution of your original problem, taking advantage of the VHDL iteration constructs. If you want a pure combinational solution for some reason, you can simply remove the if rising_edge(clk) condition. 

 

The assignment to the slice lq(j*4+3 downto j*4) isn't accepted at least by my VHDL version, I think a slice must generally needs a constant range. I used a bitwise copy instead. 

 

 

--- Quote Start ---  

i dont know how to remove those extra zeros. 

--- Quote End ---  

 

The suggested solution gives the results specified in your original post.
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

Case statements would only inflate the code. josyb has shown an exact solution of your original problem, taking advantage of the VHDL iteration constructs. If you want a pure combinational solution for some reason, you can simply remove the if rising_edge(clk) condition. 

 

The assignment to the slice lq(j*4+3 downto j*4) isn't accepted at least by my VHDL version, I think a slice must generally needs a constant range. I used a bitwise copy instead. 

 

 

The suggested solution gives the results specified in your original post. 

--- Quote End ---  

 

 

That code will give me the solution but i cannot buy that because i dont know what circuit it will boil down to and so i cannot guaranty that it will work at 266 MHz. Can you tell me what circuit it will amke out of that for loop ?? Thats why i want a gate level circuit..
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

It is sad that in many fora across the web simple solutions get drowned in garbage. 

 

The for loop unfolds to comb logic as follows: 

 

if input(0) = '1' then 

.... 

end if; 

 

if input(1) = '1' then 

.... 

end if; 

if input(2) = '1' then 

.... 

end if; 

 

etc. until input(15) 

 

The compiler unrolls it at compile time. It is just for convenience of writing multiple statements.
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

That code will give me the solution but i cannot buy that because i dont know what circuit it will boil down to and so i cannot guaranty that it will work at 266 MHz. Can you tell me what circuit it will amke out of that for loop ?? Thats why i want a gate level circuit.. 

--- Quote End ---  

 

 

How can you expect to work at 266MHz with a purely combinatorial circuit? 

 

plus, for loops do show the circuit if you understand how it works. Plus you could use the RTL viewer to look at the circuit. 

 

A for loops makes the source code manageable.
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

I found, that the pure combinational solution (without the registers implemented by josyb) results in a tpd of about 17 ns with Cyclone III. So 266 MHz would surely need pipelining.  

 

I didn't try, if the synthesis tool will "pull-in" registers to the synthesis of the behavioral description. As previously reported in the Forum, it does e.g. in the case of a parallel divider. But most likely, you have to break the data path manually to achieve the speed requirement.
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

Wow, a lot of activity while I was away ... 

I only added the input and output registers to measure the speed (in TimeQuest you don't get tPD reports unless you set constraints and I was doing this test in QII 10.1SP1) and second to give the compiler a fair chance to get a high speed. Double registering the outputs may even help a bit more.  

While driving home from the office last night I mused that I could have written it as a function giving a better view: 

library ieee; use ieee.std_logic_1164.all; use IEEE.numeric_std.all; entity bitpos is generic ( WIDTH_D : positive := 16 ) ; port ( Clk : in std_logic ; data_in : in std_logic_vector(WIDTH_D - 1 downto 0) ; data_out : out std_logic_vector(WIDTH_D * 4 - 1 downto 0) ); end bitpos ; architecture a of bitpos is function reportbitpos( v : std_logic_vector) return std_logic_vector is variable j : natural ; variable r : std_logic_vector(v'length * 4 - 1 downto 0) ; begin j := 0 ; r := (others => '0') ; for i in 0 to v'high loop if v(i) = '1' then r(j*4+3 downto j*4) := std_logic_vector(to_unsigned(i,4)); j := j + 1 ; end if; end loop; return r ; end function ; signal ld : std_logic_vector(WIDTH_D - 1 downto 0) ; begin process(clk) begin if rising_edge(Clk ) then ld <= data_in ; data_out <= reportbitpos( ld ) ; end if ; end process; end a ; 

At the same time I added the generic to find out when we cross the 266 MHz border, at width 8 we get about 200 MHz, at width 4 500MHz+ or maximum speed. 

A fully pipelined design will need about 1280 registers, and run at maximum speed. 

 

Using a Cyclone IV E in stead of a Cyclone II ups the speed form 110 to 121 MHz for a 16 bit input vector (with a tCK setting of 8.0 ns) Compiling it for a width of 8 it gets up to 237 MHz. 

 

to FvM: 

 

--- Quote Start ---  

The assignment to the slice lq(j*4+3 downto j*4) isn't accepted at least by my VHDL version, I think a slice must generally needs a constant range. I used a bitwise copy instead. 

--- Quote End ---  

I used the VHDL 1993 setting in QII 9.1SP2 that I use for waveform simulation. Actually 9.1SP2 produces a faster circuit then 10.1SP1.
0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

I used the VHDL 1993 setting in QII 9.1SP2 that I use for waveform simulation. Actually 9.1SP2 produces a faster circuit then 10.1SP1. 

--- Quote End ---  

 

 

I didn't yet proceed to Quartus 10. I compiled under V9.0 with VHDL 1993 settings and got the below error with your recent code. I was under the assumption, that constant slice ranges are required, no idea which other setting may be different. But the point isn't very important. 

 

 

--- Quote Start ---  

Error (10394): VHDL error at bitpos.vhd(29): left bound of range must be a constant 

--- Quote End ---  

0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

It looks like Altera 'progresses' on VHDL support on each subsequent release. E.g. 9.1SP2 supports VHDL 2008 but only a subset, 10.1 supports a bit more, but not yet everything (not that we need everything ...). Maybe this also the case with VHDL-1993 support.

0 Kudos
Altera_Forum
Honored Contributor II
1,799 Views

 

--- Quote Start ---  

I didn't yet proceed to Quartus 10. I compiled under V9.0 with VHDL 1993 settings and got the below error with your recent code. I was under the assumption, that constant slice ranges are required, no idea which other setting may be different. But the point isn't very important. 

--- Quote End ---  

 

 

This is a "bug" thats been around a while, and have mentioned on our internal wiki. Apparently only Quartus has had a problem with it, no concerns with ISE or Modelsim. 

 

Maybe they finally fixed it with Q10!
0 Kudos
Altera_Forum
Honored Contributor II
1,700 Views

 

--- Quote Start ---  

This is a "bug" thats been around a while, and have mentioned on our internal wiki. 

--- Quote End ---  

 

Thanks for clarifying. According to josyb, it has been already fixed in V9.1SP2. I accepted the limitation when working with previous Quartus versions and didn't yet figure out, if a variable slice range is required by the VHDL standard. Do you know the exact paragraph in IEEE 1076?
0 Kudos
Reply