- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to design a combinatorial circuit for an fpga which will count the position of ones in a 16 bit vector.
eg1 if vector is 16'b0000_0000_0011_0101 then output should be 64'h0000_0000_0000_5420 eg2 if vector is 16'b0000_1010_0000_1010 then output should be 64'h0000_0000_0000_b931. Please help. Thanks in advance....Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This will not be straightforward to realise in a combinatorial circuit because what you describe is a rather sequential process: traverse the input vector from right to left, if you encounter a bit set to '1' add its position index into the next available output slot.
This doesn't mean it can't be done combinatorially, but the result will be huge as it has to cater for all combinations (all 65356 of them ...).- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
only as guide, not tested
process(data_in)
begin
for i in 0 to 15 loop
if data_in(i) = '0' then
data_out(i*4:i*4+3) <= "0000";
else
data_out(i*4:i*4+3) <= std_logic_vector(to_unsigned(i,4));
end if;
end loop;
end process;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- only as guide, not tested
process(data_in)
begin
for i in 0 to 15 loop
if data_in(i) = '0' then
data_out(i*4:i*4+3) <= "0000";
else
data_out(i*4:i*4+3) <= std_logic_vector(to_unsigned(i,4));
end if;
end loop;
end process;
--- Quote End --- That doesn't do what ganeshmirajkar put forward, as e.g. bitposition 3 will always fill something into output nibble 3. So we need at least a variable. Steeling your code snippet: process(data_in)
variable j : natural ;
begin
j := 0 ;
data_out <= (others => '0') ;
for i in 0 to 15 loop
if data_in(i) = '1' then
data_out(j*4:j*4+3) <= std_logic_vector(to_unsigned(i,4));
j := j + 1 ;
end if;
end loop;
end process;
I wonder how this compiles.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the loop index is going up anyway. for bit index 3 then output nibble = 3 if it is '1' else nibble = 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- the loop index is going up anyway. for bit index 3 then output nibble = 3 if it is '1' else nibble = 0 --- Quote End --- e.g. for bit 3, i == 3 and you will fill in nibble 3, but if this was the first bit (set to '1') encountered nibble 0 should get value 3. I don't think your code is doing that. Anyway I compiled my proposition, and it is actually not too bad, The compiler generated 477 LCs for a Cyclone II device. I still (would/will) have to simulate it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And simulated. 466 LC, 80 registers, 105 MHz in a Cyclone II EP2C5F256C6.
library ieee;
use ieee.std_logic_1164.all;
use IEEE.numeric_std.all;
entity bitpos is
port (
Clk : in std_logic ;
data_in : in std_logic_vector(15 downto 0) ;
data_out : out std_logic_vector(63 downto 0)
);
end bitpos ;
architecture a of bitpos is
signal ld : std_logic_vector(15 downto 0) ;
begin
process(clk)
variable j : natural ;
variable lq : std_logic_vector(63 downto 0) ;
begin
if rising_edge(Clk ) then
ld <= data_in ;
j := 0 ;
lq := (others => '0') ;
for i in 0 to 15 loop
if ld(i) = '1' then
lq(j*4+3 downto j*4) := std_logic_vector(to_unsigned(i,4));
j := j + 1 ;
end if;
end loop;
data_out <= lq ;
end if ;
end process;
end a ;
It uses a lot less resources than I initially guessed. Just showing how awesome powerful VHDL is. (Verilog very, very probably too)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- e.g. for bit 3, i == 3 and you will fill in nibble 3, but if this was the first bit (set to '1') encountered nibble 0 should get value 3. I don't think your code is doing that. Anyway I compiled my proposition, and it is actually not too bad, The compiler generated 477 LCs for a Cyclone II device. I still (would/will) have to simulate it. --- Quote End --- if bit index == 3 then it equals 3, how then it equals something else? it is physically decided at input declaration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- This will not be straightforward to realise in a combinatorial circuit because what you describe is a rather sequential process: traverse the input vector from right to left, if you encounter a bit set to '1' add its position index into the next available output slot. This doesn't mean it can't be done combinatorially, but the result will be huge as it has to cater for all combinations (all 65356 of them ...). --- Quote End --- Actually it can be done in a combinatorial way but you dont take a case statement approach... i think there is a way... Actually i have managed to get a partial output. eg if input is 16'b0000_0000_0001_0101 then i have managed to get the output 64'h0000_0000_0004_0200... i dont know how to remove those extra zeros. In this case the expected output is 64'h0000_0000_0000_0420.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- only as guide, not tested
process(data_in)
begin
for i in 0 to 15 loop
if data_in(i) = '0' then
data_out(i*4:i*4+3) <= "0000";
else
data_out(i*4:i*4+3) <= std_logic_vector(to_unsigned(i,4));
end if;
end loop;
end process;
--- Quote End --- No for loops because i dont know what circuit will the synthesizer make out of that...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your replies..people.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Actually it can be done in a combinatorial way but you dont take a case statement approach... i think there is a way... --- Quote End --- Case statements would only inflate the code. josyb has shown an exact solution of your original problem, taking advantage of the VHDL iteration constructs. If you want a pure combinational solution for some reason, you can simply remove the if rising_edge(clk) condition. The assignment to the slice lq(j*4+3 downto j*4) isn't accepted at least by my VHDL version, I think a slice must generally needs a constant range. I used a bitwise copy instead. --- Quote Start --- i dont know how to remove those extra zeros. --- Quote End --- The suggested solution gives the results specified in your original post.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Case statements would only inflate the code. josyb has shown an exact solution of your original problem, taking advantage of the VHDL iteration constructs. If you want a pure combinational solution for some reason, you can simply remove the if rising_edge(clk) condition. The assignment to the slice lq(j*4+3 downto j*4) isn't accepted at least by my VHDL version, I think a slice must generally needs a constant range. I used a bitwise copy instead. The suggested solution gives the results specified in your original post. --- Quote End --- That code will give me the solution but i cannot buy that because i dont know what circuit it will boil down to and so i cannot guaranty that it will work at 266 MHz. Can you tell me what circuit it will amke out of that for loop ?? Thats why i want a gate level circuit..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is sad that in many fora across the web simple solutions get drowned in garbage.
The for loop unfolds to comb logic as follows: if input(0) = '1' then .... end if; if input(1) = '1' then .... end if; if input(2) = '1' then .... end if; etc. until input(15) The compiler unrolls it at compile time. It is just for convenience of writing multiple statements.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- That code will give me the solution but i cannot buy that because i dont know what circuit it will boil down to and so i cannot guaranty that it will work at 266 MHz. Can you tell me what circuit it will amke out of that for loop ?? Thats why i want a gate level circuit.. --- Quote End --- How can you expect to work at 266MHz with a purely combinatorial circuit? plus, for loops do show the circuit if you understand how it works. Plus you could use the RTL viewer to look at the circuit. A for loops makes the source code manageable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found, that the pure combinational solution (without the registers implemented by josyb) results in a tpd of about 17 ns with Cyclone III. So 266 MHz would surely need pipelining.
I didn't try, if the synthesis tool will "pull-in" registers to the synthesis of the behavioral description. As previously reported in the Forum, it does e.g. in the case of a parallel divider. But most likely, you have to break the data path manually to achieve the speed requirement.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wow, a lot of activity while I was away ...
I only added the input and output registers to measure the speed (in TimeQuest you don't get tPD reports unless you set constraints and I was doing this test in QII 10.1SP1) and second to give the compiler a fair chance to get a high speed. Double registering the outputs may even help a bit more. While driving home from the office last night I mused that I could have written it as a function giving a better view:
library ieee;
use ieee.std_logic_1164.all;
use IEEE.numeric_std.all;
entity bitpos is
generic (
WIDTH_D : positive := 16
) ;
port (
Clk : in std_logic ;
data_in : in std_logic_vector(WIDTH_D - 1 downto 0) ;
data_out : out std_logic_vector(WIDTH_D * 4 - 1 downto 0)
);
end bitpos ;
architecture a of bitpos is
function reportbitpos( v : std_logic_vector)
return std_logic_vector
is
variable j : natural ;
variable r : std_logic_vector(v'length * 4 - 1 downto 0) ;
begin
j := 0 ;
r := (others => '0') ;
for i in 0 to v'high loop
if v(i) = '1' then
r(j*4+3 downto j*4) := std_logic_vector(to_unsigned(i,4));
j := j + 1 ;
end if;
end loop;
return r ;
end function ;
signal ld : std_logic_vector(WIDTH_D - 1 downto 0) ;
begin
process(clk)
begin
if rising_edge(Clk ) then
ld <= data_in ;
data_out <= reportbitpos( ld ) ;
end if ;
end process;
end a ;
At the same time I added the generic to find out when we cross the 266 MHz border, at width 8 we get about 200 MHz, at width 4 500MHz+ or maximum speed. A fully pipelined design will need about 1280 registers, and run at maximum speed. Using a Cyclone IV E in stead of a Cyclone II ups the speed form 110 to 121 MHz for a 16 bit input vector (with a tCK setting of 8.0 ns) Compiling it for a width of 8 it gets up to 237 MHz. to FvM: --- Quote Start --- The assignment to the slice lq(j*4+3 downto j*4) isn't accepted at least by my VHDL version, I think a slice must generally needs a constant range. I used a bitwise copy instead. --- Quote End --- I used the VHDL 1993 setting in QII 9.1SP2 that I use for waveform simulation. Actually 9.1SP2 produces a faster circuit then 10.1SP1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- I used the VHDL 1993 setting in QII 9.1SP2 that I use for waveform simulation. Actually 9.1SP2 produces a faster circuit then 10.1SP1. --- Quote End --- I didn't yet proceed to Quartus 10. I compiled under V9.0 with VHDL 1993 settings and got the below error with your recent code. I was under the assumption, that constant slice ranges are required, no idea which other setting may be different. But the point isn't very important. --- Quote Start --- Error (10394): VHDL error at bitpos.vhd(29): left bound of range must be a constant --- Quote End ---
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like Altera 'progresses' on VHDL support on each subsequent release. E.g. 9.1SP2 supports VHDL 2008 but only a subset, 10.1 supports a bit more, but not yet everything (not that we need everything ...). Maybe this also the case with VHDL-1993 support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- I didn't yet proceed to Quartus 10. I compiled under V9.0 with VHDL 1993 settings and got the below error with your recent code. I was under the assumption, that constant slice ranges are required, no idea which other setting may be different. But the point isn't very important. --- Quote End --- This is a "bug" thats been around a while, and have mentioned on our internal wiki. Apparently only Quartus has had a problem with it, no concerns with ISE or Modelsim. Maybe they finally fixed it with Q10!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- This is a "bug" thats been around a while, and have mentioned on our internal wiki. --- Quote End --- Thanks for clarifying. According to josyb, it has been already fixed in V9.1SP2. I accepted the limitation when working with previous Quartus versions and didn't yet figure out, if a variable slice range is required by the VHDL standard. Do you know the exact paragraph in IEEE 1076?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page