mod operation synthesis (Avalon-MM master writing to DRAM problem)

Altera_Forum · ‎04-09-2016

Hi. I'm trying to write a simple Avalon-MM master component which writes an image to the DRAM memory. I use this component in a Qsys system with Nios II/e processor, SDRAM controller and University Program video cores. I'm trying this on DE2-115 board (Cyclone IV EP4CE115F29C7, 50 MHz clock) connected to a monitor via VGA. The SDRAM memory and the contoller, and my component are driven by 167 MHz clock generated by a PLL. The display part consists of UP video cores; DMA which reads the 800x600 8-bit grayscale image and VGA controller.

My component source is here https://gist.github.com/woky/a9a02ac03e5ccd23b821262d0c607255. (It's also below but gist has line numbers). The component is either waiting for arrival of an address on ctl interface or writing an image to the address received on the ctl interface. The image is just black top half and white bottom half. In main() in my Nios program I just allocate memory via malloc() and write its address into the UP video DMA and my component. Please ignore debug_* signals, they're just for debugging purposes (displaying state on 7 seg displays and leds).

I originally used the mod operation on pixel_counter (commented in the code), but results were varying and wrong. Sometimes it looked the image wasn't written at all but the writing branch was entered (LED on debug_out(1)). Sometimes the main() froze on something. Sometimes it wrote just 256 or 512 or 4096 pixels (observed only via pixel_counter on 7 segs but not via screen). It's enough to uncomment line 66 and comment line 67 to unleash the madness.

What could be the reason for this strange and unpredictable behaviour?

Thank you.


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity frame_writer is
    port (
        clk            : in  std_logic                     := '0';             --   clk.clk
        reset          : in  std_logic                     := '0';             -- reset.reset
        ctl_write      : in  std_logic                     := '0';             --   ctl.write
        ctl_writedata  : in  std_logic_vector(31 downto 0) := (others => '0'); --      .writedata
        wr_address     : out std_logic_vector(31 downto 0);                    --    wr.address
        wr_burstcount  : out std_logic_vector(10 downto 0);                    --      .burstcount
        wr_waitrequest : in  std_logic                     := '0';             --      .waitrequest
        wr_writedata   : out std_logic_vector(31 downto 0);                    --      .writedata
        wr_write       : out std_logic;                                        --      .write
        debug_out      : out std_logic_vector(127 downto 0);                    -- debug.debug_out
        debug_in       : in  std_logic_vector(127 downto 0) := (others => '0')  --      .debug_in
    );
end entity frame_writer;
architecture rtl of frame_writer is
    constant FRAME_SIZE: natural := 800 * 600;
    signal pixel_counter: natural;
    signal start_write: std_logic;
    signal writeaddr: std_logic_vector(31 downto 0);
begin
    wr_burstcount <= "00000000001";
    debug_out(38 downto 20) <= std_logic_vector(to_unsigned(pixel_counter, 19));
    process (clk, reset)
    begin
        if reset = '1' then
            start_write <= '0';
            pixel_counter <= 0;
            debug_out(1 downto 0) <= (others => '0');
        elsif rising_edge(clk) then
            --if start_write = '0' and pixel_counter = 0 then
            if start_write = '0' and (pixel_counter = 0 or pixel_counter >= FRAME_SIZE) then
                wr_write <= '0';
                pixel_counter <= 0;
                wr_address <= (others => '0');
                wr_writedata <= (others => '0');
                if ctl_write = '1' then
                    start_write <= '1';
                    writeaddr <= ctl_writedata;
                end if;
                debug_out(0) <= '0';
            else
                wr_write <= '1';
                wr_address <= std_logic_vector(unsigned(writeaddr) +
                        to_unsigned(pixel_counter, wr_address'length));
                if pixel_counter < FRAME_SIZE/2 then
                    wr_writedata <= x"00000000";
                else
                    wr_writedata <= x"ffffffff";
                end if;
                if wr_waitrequest = '0' then
                    start_write <= '0';
                    --pixel_counter <= (pixel_counter + 4) mod FRAME_SIZE;
                    pixel_counter <= pixel_counter + 4;
                end if;
                debug_out(0) <= '1';
                debug_out(1) <= '1';
            end if;
        end if;
    end process;
end architecture rtl;

Altera_Forum · ‎04-10-2016

The problem with the mod or rem operators when they are not 2**N, is that it implements a divider. These have terrible timing performance in a single clock cycle (about 20MHz if you're lucky). So the fact that you're using a 167MHz clock probably meant it was basically producing random values. Do you have timing constraints for the design? did you look at them and see the failures?

Altera_Forum · ‎04-10-2016

--- Quote Start ---

The problem with the mod or rem operators when they are not 2**N, is that it implements a divider. These have terrible timing performance in a single clock cycle (about 20MHz if you're lucky). So the fact that you're using a 167MHz clock probably meant it was basically producing random values. Do you have timing constraints for the design? did you look at them and see the failures?

--- Quote End ---

Tricky, thank you. I guess you're right. I haven't learned the timing analysis part of the design yet. I added the following *.sdc file to my project:


create_clock -name clock_50 -period 20 
derive_pll_clocks
derive_clock_uncertainty

And here's the "red" TimeQuest report I get with the mod operation: https://docs.google.com/spreadsheets/d/1pumvhheg8nyqjznpagodadgxgcndvcolb7laxp3bmym/pubhtml

I don't know how to interpret this yet but I guess that's you're talking about. Would you please briefly explain what this report say?

Altera_Forum · ‎04-10-2016

You asked for 20 ns clock period, but with a worst case slack of -24.5ns it means the data can arrive 24ns late (ie over an entire clock period). This analysis is the worst case (the design will be affected by temperature) but basically, its very bad. It means that the FMax you can use to guarantee data arrival before the clock is 20+24.5 ns = 44.5ns (about 22 MHz).

Dont use a mod operation for non 2**N values.