Re: SDC constrains for async static RAM

Vic3Dexe · ‎05-23-2022

I have async static RAM with 10 ns access time.

I try to read data from it in such a way, that address formed in one register, then data latched to another, both have the same clock 50 MHz (20 ns). nCE for chip is always = 0, i.e. always enabled and nRD also always = 0. Write is rare and it works good.

The problem is that the SRAM readed from time to time, so part of data is corrupted.

Actual problem is in delay in FPGA (Cyclone III) from data pins to data register. Timequest says it is about 6 ns (!!!), so (access time + address delay from register to port + data delay from port to register) > 20 ns, and data not always latched in time. Trace delay on board is negligible.

I assume I need to write some constrains, but I'm really confused how to do this. And I can't find any examples for this case. So the actual question: how to write these **bleep** constrains?

ak6dn · ‎05-23-2022

Here is my setup on a Terasic DE1 board, which uses a CycloneII FPGA and an attached 10ns async SRAM device.
There is a 50MHz (20ns) clock input that is transformed thru a PLL to an 80MHz (12.5ns) clock for the logic.
I can perform memory tests and read and write this SRAM continuously (for days on end...) with no errors occurring.

FYI the SRAM is used as the main memory for an FPGA PDP-8 implementation


# Input 50MHz reference clock

create_clock -period 20.0 -name CLOCK_50 [get_ports {CLOCK_50}]

# Created clocks based on PLLs (CPUCLK = 80MHz)

create_generated_clock -source {pll|altpll_component|pll|inclk[0]} -divide_by 5 -multiply_by 8 -duty_cycle 50 -name CPUCLK {pll|altpll_component|pll|clk[0]}

### external async SRAM timing ###

# address/control outputs

set_output_delay -clock CPUCLK -clock_fall -max 4.0 [get_ports {SRAM_*_L SRAM_A[*]}]
set_output_delay -clock CPUCLK -clock_fall -min 0.5 [get_ports {SRAM_*_L SRAM_A[*]}]

# write data outputs

set_output_delay -clock CPUCLK -max 3.0 [get_ports {SRAM_DQ[*]}]
set_output_delay -clock CPUCLK -min 0.5 [get_ports {SRAM_DQ[*]}]
set_multicycle_path -rise_from CPUCLK -to [get_ports {SRAM_DQ[*]}] -setup 2
set_multicycle_path -rise_from CPUCLK -to [get_ports {SRAM_DQ[*]}] -hold 2

# read data inputs

set_input_delay -clock CPUCLK -max 10.0 [get_ports {SRAM_DQ[*]}]
set_input_delay -clock CPUCLK -min  3.0 [get_ports {SRAM_DQ[*]}]
set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -setup 2
set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -hold 2

And for reference here is the verilog implementation it references...

module mm8e_memory
    #(
      // external parameters

      parameter		TPD = 0,		// simulation delay
      parameter		INTBANKS = 2,		// memory size, 4K banks (internal memory)
      parameter		EXTBANKS = 6		// memory size, 4K banks (external memory)

      )
    (
     // port definitions

     input wire		clk,			// system clock
     input wire 	reset,			// system reset

     input wire 	init,			// bus init

     input wire 	mr,			// memory read
     input wire 	mw,			// memory write
     input wire [0:2] 	ema,			// extended memory address
     input wire [0:11] 	ma,			// memory address

     inout wire [0:11] 	md,			// memory data in/out

     output reg [14:0] 	ext_addr,		// external memory address
     output reg 	ext_we_l,		// external memory write enable
     output reg 	ext_ce_l,		// external memory select
     output reg 	ext_oe_l,		// external memory read enable

     inout wire [11:0] 	ext_dq			// external memory data in/out

     );

    // internal parameters

    localparam
	MEMSIZE = 4096*INTBANKS;		// internal memory size

    // local signals

    wire [14:0] 	addr = {ema[0:2],ma[0:11]}; // full memory address

    reg [0:11] 		memory [0:MEMSIZE-1] /* synthesis ramstyle = "no_rw_check" */;
    reg [0:11] 		mdo;
    wire [0:11] 	mdi = md;
    reg 		mrd;

    wire 		enb_int = (INTBANKS > 0) && (ema <= INTBANKS-1);
    wire 		enb_ext = (EXTBANKS > 0) && (ema >= INTBANKS) && (ema <= INTBANKS+EXTBANKS-1);

    // internal memory

    initial
	$readmemb("meminit.txt", memory, 0, MEMSIZE-1);

    always @(posedge clk) mrd <= #TPD mr & enb_int;

    wire 		mwr = mw & enb_int;

    always @(posedge clk)
	begin
        if (mwr) memory[addr] <= #TPD mdi;
        mdo <= #TPD memory[addr];
	end

    assign 		md = mr & mrd ? mdo : {12{1'bz}};

    // external memory

    always @(negedge clk)
	begin
  	ext_addr <= #TPD addr;
	ext_ce_l <= #TPD 1'b0;
	ext_we_l <= #TPD ~( mw & ~mr) | ~ext_we_l;
	ext_oe_l <= #TPD ~(~mw &  mr);
	end
    
    assign 		ext_dq = mw & ~mr ? mdi : {12{1'bz}};
    assign 		md = mr & ~mw & enb_ext ? ext_dq : {12{1'bz}};

endmodule // mm8e_memory

Vic3Dexe · ‎05-23-2022

In your example you have md assigned twice, is it ok?

    assign 		md = mr & mrd ? mdo : {12{1'bz}};
...
    assign 		md = mr & ~mw & enb_ext ? ext_dq : {12{1'bz}};

And mdo is actually read from internal memory, not external

   always @(posedge clk)
	begin
        if (mwr) memory[addr] <= #TPD mdi;
        mdo <= #TPD memory[addr];
	end

while data from external (as far as I understand) is not registered, md lines are just output of the module

   assign 		ext_dq = mw & ~mr ? mdi : {12{1'bz}};
    assign 		md = mr & ~mw & enb_ext ? ext_dq : {12{1'bz}};

So it's not my case, my problem starts when I try to register md lines.

And in these lines

set_input_delay -clock CPUCLK -max 10.0 [get_ports {SRAM_DQ[*]}]
set_input_delay -clock CPUCLK -min  3.0 [get_ports {SRAM_DQ[*]}]
set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -setup 2
set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -hold 2

10 and 3 are what? Ok, lets assume 10 is 10 ns access time. So what is 3? Hold time? Why 3? Shouldn't it be negative?

Multicycle is 2... why? In your code you set address at negedge and latch data at posedge (lets assume you read external RAM). Isn't it all counts as only 1 cycle?

Same in these lines

set_output_delay -clock CPUCLK -clock_fall -max 4.0 [get_ports {SRAM_*_L SRAM_A[*]}]
set_output_delay -clock CPUCLK -clock_fall -min 0.5 [get_ports {SRAM_*_L SRAM_A[*]}]

where 4 and 0.5 comes from?

I appreciate for helping, but I want not only copy-paste, I want to understand what I copypasting )

ak6dn · ‎05-23-2022

I was not intending to provide THE solution to your problem, only HOW I implemented my solution, to show how to apply SDC constraints. I did not intend it to be a cut and paste solution for you.

In my particular case, there are two memories on the same bus, an internal memory implemented via block rams, and an external memory implemented in the 256KB async SRAM device attached to the FPGA. There was not enough internal block ram available to build the entire memory (32K x 12 bit) using internal block ram, so I split it and have the low 8K internal, the upper 24K external.

Yes, the md lines are assigned twice, as a tri-state bus with mutually exclusive enable signals.

Yes, mdo is a register that only clocks the output data of the internal block ram.

md lines are registered at the next higher level to this module (at the posedge of clk).

The timing setup/hold numbers were based on the data sheet specs of the SRAM device on the board.

Multicycle is 2 since it is not realistic to drive a 10ns access SRAM device using a 12.5ns clock period.

If you believe you can meet timing using a 10ns device on a 20ns clock period, then you only need set_input_delay and set_output_delay. No multicycle_path statement needed.

Vic3Dexe · ‎05-23-2022

@ak6dn wrote:

I was not intending to provide THE solution to your problem, only HOW I implemented my solution, to show how to apply SDC constraints. I did not intend it to be a cut and paste solution for you.

Oh, man, I'm sorry. This is my bad english. I don't mean you should provide a copy-paste solution for me.

I mean I don't understand how your solution works, so I can't use it to produce my solution )

@ak6dn wrote:

md lines are registered at the next higher level to this module (at the posedge of clk).

Multicycle is 2 since it is not realistic to drive a 10ns access SRAM device using a 12.5ns clock period.

If you believe you can meet timing using a 10ns device on a 20ns clock period, then you only need set_input_delay and set_output_delay. No multicycle_path statement needed.

Sounds more reasonable to me. So you keep registering data at posedge, but wait 1 extra period of clk before actually using data?

Well, I can lower the frequency too, I just don't want to do this.

@ak6dn wrote:

The timing setup/hold numbers were based on the data sheet specs of the SRAM device on the board.

How they are based? I tried to use some formulas from google, but got a nonsense.

Nurina · ‎05-23-2022

Hi,

Could you try put your data_reg at the I/O? This would reduce the delay.

Regards,

Nurina

Vic3Dexe · ‎05-23-2022

@Nurina wrote:

Could you try put your data_reg at the I/O? This would reduce the delay.

I will try, thx.

Vic3Dexe · ‎05-24-2022

And I forgot to mention.

I've added yesterday

set_max_delay -from [get_ports {data_pin[*]}] -to [get_registers {data_reg[*]}] 3.0

W/o this all data were corrupted. With this line only few.

Then I've assigned fast_input_register to data_pin (not data_reg, fitter keep ignoring this). Nothing changed. How to check is it actually fast now?

Then I've change 3.0 to 2.0, and it working fine.

So as I see, I've manually fixed delay between pin and reg. Chip planner confirms this, the register is now much closer to the pin.

But it is definitely bad approach. I think quartus should do this automatically with correct constrains.

I've tried to put

set_input_delay -clock [get_clocks {50M}] -max 8.5 [get_ports data_pin*]
set_input_delay -clock [get_clocks {50M}] -min 0 [get_ports data_pin*]

and get lot of slacks (setup 50M), mostly all data are corrupted. For example one of them

Obviously, that constrains are wrong, because they do reverse things.

So, again, the question: how to tell quartus automatically reduce delay on data path? What constrains should be there?

UPD: and I'm stuck again.

Data were slightly corrupted after some changes, and I've changed 2.0 to 1.0. All works fine (again), but in chip planner data_reg is veeery far away from data_pin. And data path from TimeQuest reports 3.9 ns (it was about 2.2 before)...

I don't understand why it is working, what am I doing, and what to do next.

Nurina · ‎05-31-2022

You could try logic lock to make sure data_reg stays close to data_pin.

https://www.youtube.com/watch?v=Bt-yDRReKZw&t=1186s

Nurina · ‎06-06-2022

Hi,

May I know if your problem has been resolved?

Regards,

Nurina

Nurina · ‎06-09-2022

Hello,

We did not receive any response to the previous reply provided, thus I will put this case to close pending. Please post a response in the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you with your follow-up questions.

Regards,

Nurina

P/S: If you like my comment, feel free to give Kudos. If my comment solved your problem, feel free to accept my comment as solution.

Vic3Dexe · ‎06-12-2022

2Nurina

I'm really appreciate for help.

1. Logic lock is overkill. It should work without such a deep intervetion. It's a simple RAM, come on!

2. I've found some logic between data/addr register and FPGA ports. Totally forgot about that. So I've removed this logic (strictly speaking, I've moved it behind registers), and now reading is perfect. I've even removed set_max_delay lines in SDC. This is good, because I don't understand what they did anyway )

3. But then a new problem arrived: in some cases write is failing, and I don't understand why. Maybe again it is logic on the nWR line, idk. I can't remove this logic right now, will try some workaround.

4. About this topic... I've asked about constrains for static RAM, delays is a secondary problem, it should be resolved by correct SDC. I'm now reading this, it is much more close to what I need. Hope it will help. If you want to close this topic - it's ok.

ps One more question: are there any major differences between Cyclone III and IV?

I mean, i/o buffers or smth. Because I did something similar couple years ago on Cyclone IV, and have no problems at all without any constrains, fast registers etc.

Vic3Dexe · ‎06-13-2022

Well, it seems to me I've forced it to work.

I've assigned ADDR pins to fast output, and DATA pins to fast input/output.

Then, I've created a new clock with PLL, it's the same 50 MHz, but with 90 degree shift. With this clock I've moved the nWR pulse a little forward.

Idk what have actually helped - the fast registers thing or shifted clock, but for now I can't reproduce any errors.

One thing I can't figure out: why signaltap don't show me the phase shift?

Here is the screenshot, capture on 200 MHz from PLL, trigger on falling edge of nWR.

add: seems it is shifted clock. I've tried to switch nWR back to the 0 degree one, and got errors.