Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16612 Discussions

mod operation synthesis (Avalon-MM master writing to DRAM problem)

Altera_Forum
Honored Contributor II
1,266 Views

Hi. I'm trying to write a simple Avalon-MM master component which writes an image to the DRAM memory. I use this component in a Qsys system with Nios II/e processor, SDRAM controller and University Program video cores. I'm trying this on DE2-115 board (Cyclone IV EP4CE115F29C7, 50 MHz clock) connected to a monitor via VGA. The SDRAM memory and the contoller, and my component are driven by 167 MHz clock generated by a PLL. The display part consists of UP video cores; DMA which reads the 800x600 8-bit grayscale image and VGA controller. 

 

My component source is here https://gist.github.com/woky/a9a02ac03e5ccd23b821262d0c607255. (It's also below but gist has line numbers). The component is either waiting for arrival of an address on ctl interface or writing an image to the address received on the ctl interface. The image is just black top half and white bottom half. In main() in my Nios program I just allocate memory via malloc() and write its address into the UP video DMA and my component. Please ignore debug_* signals, they're just for debugging purposes (displaying state on 7 seg displays and leds). 

 

I originally used the mod operation on pixel_counter (commented in the code), but results were varying and wrong. Sometimes it looked the image wasn't written at all but the writing branch was entered (LED on debug_out(1)). Sometimes the main() froze on something. Sometimes it wrote just 256 or 512 or 4096 pixels (observed only via pixel_counter on 7 segs but not via screen). It's enough to uncomment line 66 and comment line 67 to unleash the madness. 

 

What could be the reason for this strange and unpredictable behaviour? 

 

Thank you. 

 

library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity frame_writer is port ( clk : in std_logic := '0'; -- clk.clk reset : in std_logic := '0'; -- reset.reset ctl_write : in std_logic := '0'; -- ctl.write ctl_writedata : in std_logic_vector(31 downto 0) := (others => '0'); -- .writedata wr_address : out std_logic_vector(31 downto 0); -- wr.address wr_burstcount : out std_logic_vector(10 downto 0); -- .burstcount wr_waitrequest : in std_logic := '0'; -- .waitrequest wr_writedata : out std_logic_vector(31 downto 0); -- .writedata wr_write : out std_logic; -- .write debug_out : out std_logic_vector(127 downto 0); -- debug.debug_out debug_in : in std_logic_vector(127 downto 0) := (others => '0') -- .debug_in ); end entity frame_writer; architecture rtl of frame_writer is constant FRAME_SIZE: natural := 800 * 600; signal pixel_counter: natural; signal start_write: std_logic; signal writeaddr: std_logic_vector(31 downto 0); begin wr_burstcount <= "00000000001"; debug_out(38 downto 20) <= std_logic_vector(to_unsigned(pixel_counter, 19)); process (clk, reset) begin if reset = '1' then start_write <= '0'; pixel_counter <= 0; debug_out(1 downto 0) <= (others => '0'); elsif rising_edge(clk) then --if start_write = '0' and pixel_counter = 0 then if start_write = '0' and (pixel_counter = 0 or pixel_counter >= FRAME_SIZE) then wr_write <= '0'; pixel_counter <= 0; wr_address <= (others => '0'); wr_writedata <= (others => '0'); if ctl_write = '1' then start_write <= '1'; writeaddr <= ctl_writedata; end if; debug_out(0) <= '0'; else wr_write <= '1'; wr_address <= std_logic_vector(unsigned(writeaddr) + to_unsigned(pixel_counter, wr_address'length)); if pixel_counter < FRAME_SIZE/2 then wr_writedata <= x"00000000"; else wr_writedata <= x"ffffffff"; end if; if wr_waitrequest = '0' then start_write <= '0'; --pixel_counter <= (pixel_counter + 4) mod FRAME_SIZE; pixel_counter <= pixel_counter + 4; end if; debug_out(0) <= '1'; debug_out(1) <= '1'; end if; end if; end process; end architecture rtl;
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
231 Views

The problem with the mod or rem operators when they are not 2**N, is that it implements a divider. These have terrible timing performance in a single clock cycle (about 20MHz if you're lucky). So the fact that you're using a 167MHz clock probably meant it was basically producing random values. Do you have timing constraints for the design? did you look at them and see the failures?

0 Kudos
Altera_Forum
Honored Contributor II
231 Views

 

--- Quote Start ---  

The problem with the mod or rem operators when they are not 2**N, is that it implements a divider. These have terrible timing performance in a single clock cycle (about 20MHz if you're lucky). So the fact that you're using a 167MHz clock probably meant it was basically producing random values. Do you have timing constraints for the design? did you look at them and see the failures? 

--- Quote End ---  

 

 

Tricky, thank you. I guess you're right. I haven't learned the timing analysis part of the design yet. I added the following *.sdc file to my project: 

create_clock -name clock_50 -period 20 derive_pll_clocks derive_clock_uncertainty  

 

And here's the "red" TimeQuest report I get with the mod operation: https://docs.google.com/spreadsheets/d/1pumvhheg8nyqjznpagodadgxgcndvcolb7laxp3bmym/pubhtml 

 

I don't know how to interpret this yet but I guess that's you're talking about. Would you please briefly explain what this report say?
0 Kudos
Altera_Forum
Honored Contributor II
231 Views

You asked for 20 ns clock period, but with a worst case slack of -24.5ns it means the data can arrive 24ns late (ie over an entire clock period). This analysis is the worst case (the design will be affected by temperature) but basically, its very bad. It means that the FMax you can use to guarantee data arrival before the clock is 20+24.5 ns = 44.5ns (about 22 MHz). 

 

Dont use a mod operation for non 2**N values.
0 Kudos
Reply