Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)

Asynchronous load

Altera_Forum
Honored Contributor II
5,318 Views

This is a shift register question that I'm looking to get some light shed on. I have a 4 16bit shift registers that I'm using to detect a pattern. I feed bits into the shift regs until they match a pattern, at which point I replace the entire contents of the 4 shift registers with something else in a parallel load. The problem is that on one cycle, the pattern is detected and on the next clock cycle, the parallel load occurs. But also occurring on this cycle is the normal serial load. So the parallel load and the serial load collide and the parallel load seems to win and the 4 bits that I wanted to be shifted in serially are lost. Is it possible to do the comparison and load on the same cycle so that the first set of 4 bits on the serial input don't get lost? Is there a special shift register setting for this? 

 

There's a setting for asynchronous load in lpm_shiftreg but its greyed out 

 

Thanks
0 Kudos
36 Replies
Altera_Forum
Honored Contributor II
1,690 Views

Naturally you can't put two drives on same registers at same clk. 

I am not clear what you want to achieve but in general stream processing if I want to detect a pattern I use pipeline(equivalent to shift) then detect and take action.  

For example if I want to detect the pattern "F2FC012A" then want to replace data with "AB" at that point I can do this: 

 

--in a clked process: 

 

data_1d <= data_in; 

data_2d <= data_1d; 

data_3d <= data_2d; 

 

if data_pipe = x"F2FC012A" then 

data_out <= x"AB"; 

else 

data_out <= data_3d; -- or any other stage 

end if; 

 

-- outside clked process: 

data_pipe <= data_in & data_1d & data_2d & data_3d;
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

In my opinion, it's no good idea to use an asynchronous load. Use a synchronous load in the next cycle instead, shifting the replacement data by one position, concatenated with the new serial input.

0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

I ended up solving the problem by using a single DFF. Once I do the parallel load, I pass the data through the DFF so it doesn't get stomped on, and then through the shift register

0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

What's a good way to do a large comparison and large parallel load? Basically I have my sram connected to a 4 wide DFF, connected to 4 24 bit wide shift registers. When I see the pattern "1011" in the DFF, I need to load data into the 4 24 bit wide shift registers. I also need the ability to check whether the data in the 24 bit registers is equal to some 96 bit value.  

 

I notice that after the first if statement is hit, (the "1011" pattern is detected and a parallel load is completed), on the next cycle the second if statement gets triggered even though the data in the shift registers is not equivalent. I'm sure it has to do with how I'm doing my comparison and parallel loading. Is there a better way to do it then this? 

 

Here's some lines from my code: 

--reading if(readEN = '1' and hold1 = "1011" and expect_sfd) then expect_sfd <= FALSE; expect_efd <= TRUE; load <= '1'; outload <= "111111111111111100000000000000000000000000000000111111111111111100000000100000000000000000000000"; validout <= '1'; elsif(readEN = '1' and check = "1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111" and expect_efd) then expect_efd <= FALSE; expect_sfd <= TRUE; load <= '1'; outload <= "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"; validout <= '0'; stop <= '1'; end if; end if; end process; --memory mem: framestore port map(clk,reset,send,wren,readEN and not stop,hold1); bit3hold: shiftreg1 port map(clk,readEN and not stop,reset,hold1(3),smallsfd(3 downto 3),hold2(3)); bit2hold: shiftreg1 port map(clk,readEN and not stop,reset,hold1(2),smallsfd(2 downto 2),hold2(2)); bit1hold: shiftreg1 port map(clk,readEN and not stop,reset,hold1(1),smallsfd(1 downto 1),hold2(1)); bit0hold: shiftreg1 port map(clk,readEN and not stop,reset,hold1(0),smallsfd(0 downto 0),hold2(0)); bit3out: shiftreg24 port map(clk,outload(95 downto 72),load,reset,hold2(3),check(95 downto 72),output(3)); bit2out: shiftreg24 port map(clk,outload(71 downto 48),load,reset,hold2(2),check(71 downto 48),output(2)); bit1out: shiftreg24 port map(clk,outload(47 downto 24),load,reset,hold2(1),check(47 downto 24),output(1)); bit0out: shiftreg24 port map(clk,outload(23 downto 0),load,reset,hold2(0),check(23 downto 0),output(0));
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

I'm not really motivated to guess about the missing parts of code fragments. There are also many possible misunderstandings about what you regarded as incorrect design behaviour. As I already suggested in another thread: Providing a Quartus archive of an example design and a test waveform would help a lot.

0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

I can do that. It's a lot of code to look through though.  

 

Here's the link: 

http://tinyurl.com/ycjlqyx (http://tinyurl.com/ycjlqyx

 

The VHDL file in question is fixer.vhd 

The test waveform to use is newtest1.vwf. 

 

If you compile and run the wave form you can see I'm basically trying to write some ethernet frames to memory and then read them out. When I read them out I'm trying to generate a full preamble to prepend to the start of frame delimiter. Also when I read an "end of frame delimiter" (in this case 12 bytes of all 1's) I replace the "end of frame delimiter" with all 0's. There are two ethernet frames that I try and write and then read in this waveform. You can see how when they are being read, I correctly generate the first ethernet frames, preamble, retrieve the data, and replace the end part, but then for some reason the first few pieces from the next frame don't get read, among other problems.
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

Thanks, I'll look into it tomorrow. Can you tell which signal in the simulation output is correct in the first case and wrong in the second? At which time exactly? 

 

My idea was to use the archive project to generate a small archive of all necessary files and append it to the post, but the full archive is still managable in this case.
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

Not sure exactly what you mean but, if you run the simulation you can see that when readEN goes high there are two good places to look and thats where loadout goes high because that's when I'm doing the big parallel loads. The first one is ok (the first loadout = high) but the second one is not ok. I think the reason it's not ok is that it takes about 3 cycles to get something read out of memory and so if I write my code not taking this into account, I lose some values somewhere. It's just getting so bloated and complicated that its hard to figure out what to do.

0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

I managed to make some progress. It does correctly complete the simulation however, when I change the device from the Stratix II to some of the Cyclone chips, it no longer works. I understand that the hardware is different but I don't understand how the same code will work differently. 

 

http://tinyurl.com/y9craab
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

I've traced the problem to this code in the process block: 

if(readEN = '1' and first_read = '1') then address <= "000000000000000"; first_read <= '0'; elsif((wren = '1' or readEN = '1') and stop = '0') then address <= address - "11"; just_now <= '0'; elsif((wren = '1' or readEN = '1') and stop = '0') then address <= address + '1'; end if;  

 

Slightly changed from what's posted in the above URL, but for some reason, the top if block never gets entered even though I can clearly see the signals
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

 

--- Quote Start ---  

but for some reason, the top if block never gets entered even though I can clearly see the signals 

--- Quote End ---  

If you mean the assignment address <= "000000000000000";, it's executed exactly once, but I don't know if this is the intended behaviour.
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

It should be executed exactly once on the first read, but for some reason, on some versions of the hardware that I simulate it on in quartus it does not execute at all! 

 

For example, on the Cyclone II EP2C35F672C8, my address counter stops incrementing entirely. 

 

On the Cyclone II EP2C35U484C8, it behaves as intended. 

 

On the Cyclone II EP2C35F484C7, the assignment to all zeros never occurs for that first read.
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

The said problem doesn't occur with your previous posted switch20base version. Inserting the above code, it behaves strange, but the code doesn't seem meaningful, you're repeating the same if condition twice.

0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

That's strange...maybe try with this. This is the exact code I have now and it's behaving exactly as I said. What behavior are you seeeing? What condition is being repeated twice? 

 

library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use ieee.std_logic_unsigned.all; entity fixer is port ( clk,reset : in std_logic; input: in std_logic_vector(3 downto 0); -- the input ethernet frame validin: in std_logic; --the valid signal that we get --loadout : out std_logic; --holdout : out std_logic_vector(3 downto 0); addressout : out std_logic_vector(14 downto 0); --countout: out integer range 0 to 25; wrenout: out std_logic; readEN: in std_logic; --high when we want to read from memory validout: out std_logic; --high when we're sending a valid frame out output : out std_logic_vector(3 downto 0) --output has value if we're reading ); end fixer; architecture structure of fixer is signal address: std_logic_vector(14 downto 0); signal wren : std_logic; signal load: std_logic; signal stop: std_logic; signal temp: std_logic_vector (11 downto 0); signal qin : std_logic_vector (3 downto 0); signal send: std_logic_vector(3 downto 0); signal hold1: std_logic_vector (3 downto 0); signal hold2: std_logic_vector (3 downto 0); signal sfd: std_logic_vector(7 downto 0); signal expect_sfd: std_logic; signal expect_efd: std_logic; signal first_read: std_logic; signal just_now: std_logic; signal sfd_counter: integer range 0 to 25; signal efd_counter: integer range 0 to 25; signal efd_counter2: integer range 0 to 25; signal valid_counter: integer range 0 to 9; signal outload0: std_logic_vector(23 downto 0); signal outload1: std_logic_vector(23 downto 0); signal outload2: std_logic_vector(23 downto 0); signal outload3: std_logic_vector(23 downto 0); signal check0: std_logic_vector(23 downto 0); signal check1: std_logic_vector(23 downto 0); signal check2: std_logic_vector(23 downto 0); signal check3: std_logic_vector(23 downto 0); component framestore is port ( clk,reset : in std_logic; input: in std_logic_vector(3 downto 0); -- the input ethernet frame address: in std_logic_vector(14 downto 0); valid: in std_logic; --valid signal that comes with input readEN: in std_logic; --high when we want to read from memory output : out std_logic_vector(3 downto 0) --output, has value if we're reading ); end component; component shiftreg is PORT ( clock : IN STD_LOGIC ; sclr : IN STD_LOGIC ; shiftin : IN STD_LOGIC ; q : OUT STD_LOGIC_VECTOR (1 DOWNTO 0); shiftout : OUT STD_LOGIC ); end component; component shiftreg1 IS PORT ( clock : IN STD_LOGIC ; enable : IN STD_LOGIC ; sclr : IN STD_LOGIC ; shiftin : IN STD_LOGIC ; shiftout : OUT STD_LOGIC ); END component; component shiftreg24 is PORT ( clock : IN STD_LOGIC ; data : IN STD_LOGIC_VECTOR (23 DOWNTO 0); load : IN STD_LOGIC ; sclr : IN STD_LOGIC ; shiftin : IN STD_LOGIC ; q : OUT STD_LOGIC_VECTOR (23 DOWNTO 0); shiftout : OUT STD_LOGIC ); end component; begin process(clk,reset,validin,readEN,load,address,wren,first_read) begin if(reset = '1') then sfd_counter <= 0; efd_counter <= 0; efd_counter2 <= 0; valid_counter <= 0; validout <= '0'; stop <= '0'; expect_sfd <= '1'; expect_efd <= '0'; first_read <= '1'; just_now <= '0'; address <= "000000000000000"; outload0 <= "000000000000000000000000"; outload1 <= "000000000000000000000000"; outload2 <= "000000000000000000000000"; outload3 <= "000000000000000000000000"; elsif(clk'event and clk = '1') then --writing if(validin = '1' and expect_sfd = '1' and sfd(7 downto 0) = "11001110") then expect_sfd <= '0'; wren <= '1'; elsif(validin = '0' and sfd_counter < 25 and (not expect_sfd = '1') and readEN = '0') then wren <= '1'; expect_efd <= '1'; sfd_counter <= sfd_counter + 1; elsif(sfd_counter = 25 and readEN = '0') then wren <= '0'; sfd_counter <= 0; expect_sfd <= '1'; expect_efd <= '0'; elsif(validin = '1' and expect_efd = '1' and sfd_counter < 25) then sfd_counter <= sfd_counter + 1; end if; if(readEN = '1' and first_read = '1') then first_read <= '0'; address <= "000000000000000"; elsif((wren = '1' or readEN = '1') and stop = '0' and just_now = '1') then address <= address - "11"; just_now <= '0'; elsif(wren = '1' or (readEN = '1' and stop = '0' and just_now = '0')) then address <= address + '1'; end if; load <= '0'; --reading if(readEN = '1' and efd_counter = 24 and expect_sfd = '1') then stop <= '1'; just_now <= '1'; efd_counter <= 0; validout <= '0'; elsif(readEN = '1' and hold1 = "1011" and expect_sfd = '1') then expect_sfd <= '0'; expect_efd <= '1'; load <= '1'; outload3 <= "111111111111111100000000"; outload2 <= "000000000000000000000000"; outload1 <= "111111111111111100000000"; outload0 <= "100000000000000000000000"; valid_counter <= valid_counter + 1; elsif(readEN = '1' and expect_sfd = '1' and stop = '1' and efd_counter2 < 23) then efd_counter2 <= efd_counter2 + 1; validout <= '0'; elsif(readEN = '1' and expect_sfd = '1' and stop = '1' and efd_counter2 = 23) then stop <= '0'; efd_counter2 <= 0; elsif(readEN = '1' and hold1 = "1111" and expect_efd = '1' and efd_counter < 23) then efd_counter <= efd_counter + 1; elsif(readEN = '1' and efd_counter = 23 and expect_efd = '1') then expect_efd <= '0'; expect_sfd <= '1'; efd_counter <= efd_counter + 1; load <= '1'; outload0 <= "000000000000000000000000"; outload1 <= "000000000000000000000000"; outload2 <= "000000000000000000000000"; outload3 <= "000000000000000000000000"; elsif(readEN = '1' and not(hold1 = "1111")) then efd_counter <= 0; end if; if(valid_counter < 9 and valid_counter > 0) then valid_counter <= valid_counter + 1; elsif(valid_counter = 9) then validout <= '1'; valid_counter <= 0; end if; end if; end process; --input to memory bit3in: shiftreg port map(clk,reset,qin(3),sfd(7 downto 6),send(3)); bit2in: shiftreg port map(clk,reset,qin(2),sfd(5 downto 4),send(2)); bit1in: shiftreg port map(clk,reset,qin(1),sfd(3 downto 2),send(1)); bit0in: shiftreg port map(clk,reset,qin(0),sfd(1 downto 0),send(0)); --memory mem: framestore port map(clk,reset,send,address,wren,readEN and not stop,hold1); bit3hold: shiftreg1 port map(clk,readEN and not stop,reset,hold1(3),hold2(3)); bit2hold: shiftreg1 port map(clk,readEN and not stop,reset,hold1(2),hold2(2)); bit1hold: shiftreg1 port map(clk,readEN and not stop,reset,hold1(1),hold2(1)); bit0hold: shiftreg1 port map(clk,readEN and not stop,reset,hold1(0),hold2(0)); bit3out: shiftreg24 port map(clk,outload3,load,reset,hold2(3),check3,output(3)); bit2out: shiftreg24 port map(clk,outload2,load,reset,hold2(2),check2,output(2)); bit1out: shiftreg24 port map(clk,outload1,load,reset,hold2(1),check1,output(1)); bit0out: shiftreg24 port map(clk,outload0,load,reset,hold2(0),check0,output(0)); --input qin <= "1111" when validin = '0' and sfd_counter < 25 and (not expect_sfd = '1') else input; --output --temp --loadout <= load; --holdout <= hold1; --countout <= efd_counter; addressout <= address; --wrenout <= wren; end structure;
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

O.K. I see now, that the problem is apparently caused by an insufficient setup time for read_en. This clarifies, why you get different results with different chip families and speed grades. Changing read_en assertion to clk rising edge make the address counter operate correctly. 

 

read_en is an unrelated signal in your design, so it's not checked in timing analysis. You should either specify timing constraints for it (requires most likely usage of TimeQuest), register it in your design, or assign a more suitable signal timing. I guess, changing all input signals at the rising edge of clk would be better.
0 Kudos
Altera_Forum
Honored Contributor II
1,690 Views

I'm sorry, but I don't really understand what you mean. All of my code takes place under the if statement involving clk'event and clock = 1. Isn't that a rising edge? What would I be doing differently?

0 Kudos
Altera_Forum
Honored Contributor II
1,691 Views

One thing to note, the board we use in lab has the Cyclone II EP2C35F672C6N chip on it. This chip isn't in the quartus simulator. The closest I can find is the EP2C35F672C6 which it does work on in the simulator. It also works on the EP2C35F672C7 but not the EP2C35F672C8. Do you think that means it will definitely work on the board I have in the lab?

0 Kudos
Altera_Forum
Honored Contributor II
1,691 Views

It's definitely a really really small timing issue. I ran the simulation with the EP2C35F672C8 device and everything was a cycle too late but when I added in some output signals so I could see where the problem was, everything worked fine. How do you troubleshoot an issue that goes away whenever you try and observe it? Heisenberg Uncertainty Principle of VHDL...

0 Kudos
Altera_Forum
Honored Contributor II
1,691 Views

Here's my most recent version: 

http://tinyurl.com/yakrnzd 

 

There are two output signals, sfdout1 and sfdout2. If you comment them out, and their respective assignments, this code will produce desired simulation results for simulation newtest1.vwf for Cyclone II EP2C35F672C6 and EP2C35F672C7 but will not work for EP2C35F672C8. If you uncomment the sfdout1 and sfdout2 and their respective assignments, the code will produce the desired simulation results for EP2C35F672C8 but not for EP2C35F672C6 and EP2C35F672C7.
0 Kudos
Altera_Forum
Honored Contributor II
1,599 Views

The timing analysis doesn't show errors with your code. But this only means, that the internal timing is correct. The problem is however with the external signals, e.g. read_en. You assert read_en at the falling_edge of clk. This 5 ns setup time relative to clk input is apparently not sufficient under all conditions. 

 

Primarly, this is an simulation rather than a real synthesis problem. In a real design, read_en is either a clk related signal, then it has a known timing an can be considered in timing analysis. Or it's an unrelated signal, then it must be registered before entering the synchronous logic. 

 

P.S.: I'll try to explain exemplary, why adress isn't reset to 0 in this case; 

 

--- Quote Start ---  

if(readEN = '1' and first_read = '1') then 

address <= "000000000000000"; 

first_read <= '0'; 

--- Quote End ---  

 

The above expression (together with more code, of course) is translated into combinational logic that feeds several DFF. This logic is composed of several logic elements and routing resources inbetween and involves a certain delay. Unfortunately, the delay is varying between individual logic terms. You have e.g. one DFF for first_read and 14 for address. If read_en is late related to the rising edge of clock, it may happen, that it's assertion arrives at the read_en DFF before the clock edge and at several or all address DFFs after the clock edge. If this happens, address won't be reset to zero, at worst case, it can be reset partially and take arbitrary values. 

 

If you register respectively synchronize read_en, you assure that it is arrives at all DFFs of the process in the same clock cycle, so the observed problem can't occur.
0 Kudos
Reply