- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have async static RAM with 10 ns access time.
I try to read data from it in such a way, that address formed in one register, then data latched to another, both have the same clock 50 MHz (20 ns). nCE for chip is always = 0, i.e. always enabled and nRD also always = 0. Write is rare and it works good.
The problem is that the SRAM readed from time to time, so part of data is corrupted.
Actual problem is in delay in FPGA (Cyclone III) from data pins to data register. Timequest says it is about 6 ns (!!!), so (access time + address delay from register to port + data delay from port to register) > 20 ns, and data not always latched in time. Trace delay on board is negligible.
I assume I need to write some constrains, but I'm really confused how to do this. And I can't find any examples for this case. So the actual question: how to write these **bleep** constrains?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is my setup on a Terasic DE1 board, which uses a CycloneII FPGA and an attached 10ns async SRAM device.
There is a 50MHz (20ns) clock input that is transformed thru a PLL to an 80MHz (12.5ns) clock for the logic.
I can perform memory tests and read and write this SRAM continuously (for days on end...) with no errors occurring.
FYI the SRAM is used as the main memory for an FPGA PDP-8 implementation
# Input 50MHz reference clock
create_clock -period 20.0 -name CLOCK_50 [get_ports {CLOCK_50}]
# Created clocks based on PLLs (CPUCLK = 80MHz)
create_generated_clock -source {pll|altpll_component|pll|inclk[0]} -divide_by 5 -multiply_by 8 -duty_cycle 50 -name CPUCLK {pll|altpll_component|pll|clk[0]}
### external async SRAM timing ###
# address/control outputs
set_output_delay -clock CPUCLK -clock_fall -max 4.0 [get_ports {SRAM_*_L SRAM_A[*]}]
set_output_delay -clock CPUCLK -clock_fall -min 0.5 [get_ports {SRAM_*_L SRAM_A[*]}]
# write data outputs
set_output_delay -clock CPUCLK -max 3.0 [get_ports {SRAM_DQ[*]}]
set_output_delay -clock CPUCLK -min 0.5 [get_ports {SRAM_DQ[*]}]
set_multicycle_path -rise_from CPUCLK -to [get_ports {SRAM_DQ[*]}] -setup 2
set_multicycle_path -rise_from CPUCLK -to [get_ports {SRAM_DQ[*]}] -hold 2
# read data inputs
set_input_delay -clock CPUCLK -max 10.0 [get_ports {SRAM_DQ[*]}]
set_input_delay -clock CPUCLK -min 3.0 [get_ports {SRAM_DQ[*]}]
set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -setup 2
set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -hold 2
And for reference here is the verilog implementation it references...
module mm8e_memory
#(
// external parameters
parameter TPD = 0, // simulation delay
parameter INTBANKS = 2, // memory size, 4K banks (internal memory)
parameter EXTBANKS = 6 // memory size, 4K banks (external memory)
)
(
// port definitions
input wire clk, // system clock
input wire reset, // system reset
input wire init, // bus init
input wire mr, // memory read
input wire mw, // memory write
input wire [0:2] ema, // extended memory address
input wire [0:11] ma, // memory address
inout wire [0:11] md, // memory data in/out
output reg [14:0] ext_addr, // external memory address
output reg ext_we_l, // external memory write enable
output reg ext_ce_l, // external memory select
output reg ext_oe_l, // external memory read enable
inout wire [11:0] ext_dq // external memory data in/out
);
// internal parameters
localparam
MEMSIZE = 4096*INTBANKS; // internal memory size
// local signals
wire [14:0] addr = {ema[0:2],ma[0:11]}; // full memory address
reg [0:11] memory [0:MEMSIZE-1] /* synthesis ramstyle = "no_rw_check" */;
reg [0:11] mdo;
wire [0:11] mdi = md;
reg mrd;
wire enb_int = (INTBANKS > 0) && (ema <= INTBANKS-1);
wire enb_ext = (EXTBANKS > 0) && (ema >= INTBANKS) && (ema <= INTBANKS+EXTBANKS-1);
// internal memory
initial
$readmemb("meminit.txt", memory, 0, MEMSIZE-1);
always @(posedge clk) mrd <= #TPD mr & enb_int;
wire mwr = mw & enb_int;
always @(posedge clk)
begin
if (mwr) memory[addr] <= #TPD mdi;
mdo <= #TPD memory[addr];
end
assign md = mr & mrd ? mdo : {12{1'bz}};
// external memory
always @(negedge clk)
begin
ext_addr <= #TPD addr;
ext_ce_l <= #TPD 1'b0;
ext_we_l <= #TPD ~( mw & ~mr) | ~ext_we_l;
ext_oe_l <= #TPD ~(~mw & mr);
end
assign ext_dq = mw & ~mr ? mdi : {12{1'bz}};
assign md = mr & ~mw & enb_ext ? ext_dq : {12{1'bz}};
endmodule // mm8e_memory
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In your example you have md assigned twice, is it ok?
assign md = mr & mrd ? mdo : {12{1'bz}};
...
assign md = mr & ~mw & enb_ext ? ext_dq : {12{1'bz}};
And mdo is actually read from internal memory, not external
always @(posedge clk)
begin
if (mwr) memory[addr] <= #TPD mdi;
mdo <= #TPD memory[addr];
end
while data from external (as far as I understand) is not registered, md lines are just output of the module
assign ext_dq = mw & ~mr ? mdi : {12{1'bz}};
assign md = mr & ~mw & enb_ext ? ext_dq : {12{1'bz}};
So it's not my case, my problem starts when I try to register md lines.
And in these lines
set_input_delay -clock CPUCLK -max 10.0 [get_ports {SRAM_DQ[*]}]
set_input_delay -clock CPUCLK -min 3.0 [get_ports {SRAM_DQ[*]}]
set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -setup 2
set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -hold 2
10 and 3 are what? Ok, lets assume 10 is 10 ns access time. So what is 3? Hold time? Why 3? Shouldn't it be negative?
Multicycle is 2... why? In your code you set address at negedge and latch data at posedge (lets assume you read external RAM). Isn't it all counts as only 1 cycle?
Same in these lines
set_output_delay -clock CPUCLK -clock_fall -max 4.0 [get_ports {SRAM_*_L SRAM_A[*]}]
set_output_delay -clock CPUCLK -clock_fall -min 0.5 [get_ports {SRAM_*_L SRAM_A[*]}]
where 4 and 0.5 comes from?
I appreciate for helping, but I want not only copy-paste, I want to understand what I copypasting )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was not intending to provide THE solution to your problem, only HOW I implemented my solution, to show how to apply SDC constraints. I did not intend it to be a cut and paste solution for you.
In my particular case, there are two memories on the same bus, an internal memory implemented via block rams, and an external memory implemented in the 256KB async SRAM device attached to the FPGA. There was not enough internal block ram available to build the entire memory (32K x 12 bit) using internal block ram, so I split it and have the low 8K internal, the upper 24K external.
Yes, the md lines are assigned twice, as a tri-state bus with mutually exclusive enable signals.
Yes, mdo is a register that only clocks the output data of the internal block ram.
md lines are registered at the next higher level to this module (at the posedge of clk).
The timing setup/hold numbers were based on the data sheet specs of the SRAM device on the board.
Multicycle is 2 since it is not realistic to drive a 10ns access SRAM device using a 12.5ns clock period.
If you believe you can meet timing using a 10ns device on a 20ns clock period, then you only need set_input_delay and set_output_delay. No multicycle_path statement needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ak6dn wrote:
I was not intending to provide THE solution to your problem, only HOW I implemented my solution, to show how to apply SDC constraints. I did not intend it to be a cut and paste solution for you.
Oh, man, I'm sorry. This is my bad english. I don't mean you should provide a copy-paste solution for me.
I mean I don't understand how your solution works, so I can't use it to produce my solution )
@ak6dn wrote:
md lines are registered at the next higher level to this module (at the posedge of clk).
Multicycle is 2 since it is not realistic to drive a 10ns access SRAM device using a 12.5ns clock period.
If you believe you can meet timing using a 10ns device on a 20ns clock period, then you only need set_input_delay and set_output_delay. No multicycle_path statement needed.
Sounds more reasonable to me. So you keep registering data at posedge, but wait 1 extra period of clk before actually using data?
Well, I can lower the frequency too, I just don't want to do this.
@ak6dn wrote:
The timing setup/hold numbers were based on the data sheet specs of the SRAM device on the board.
How they are based? I tried to use some formulas from google, but got a nonsense.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you try put your data_reg at the I/O? This would reduce the delay.
Regards,
Nurina
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Nurina wrote:
Could you try put your data_reg at the I/O? This would reduce the delay.
I will try, thx.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And I forgot to mention.
I've added yesterday
set_max_delay -from [get_ports {data_pin[*]}] -to [get_registers {data_reg[*]}] 3.0
W/o this all data were corrupted. With this line only few.
Then I've assigned fast_input_register to data_pin (not data_reg, fitter keep ignoring this). Nothing changed. How to check is it actually fast now?
Then I've change 3.0 to 2.0, and it working fine.
So as I see, I've manually fixed delay between pin and reg. Chip planner confirms this, the register is now much closer to the pin.
But it is definitely bad approach. I think quartus should do this automatically with correct constrains.
I've tried to put
set_input_delay -clock [get_clocks {50M}] -max 8.5 [get_ports data_pin*]
set_input_delay -clock [get_clocks {50M}] -min 0 [get_ports data_pin*]
and get lot of slacks (setup 50M), mostly all data are corrupted. For example one of them
Obviously, that constrains are wrong, because they do reverse things.
So, again, the question: how to tell quartus automatically reduce delay on data path? What constrains should be there?
UPD: and I'm stuck again.
Data were slightly corrupted after some changes, and I've changed 2.0 to 1.0. All works fine (again), but in chip planner data_reg is veeery far away from data_pin. And data path from TimeQuest reports 3.9 ns (it was about 2.2 before)...
I don't understand why it is working, what am I doing, and what to do next.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could try logic lock to make sure data_reg stays close to data_pin.
https://www.youtube.com/watch?v=Bt-yDRReKZw&t=1186s
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
May I know if your problem has been resolved?
Regards,
Nurina
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We did not receive any response to the previous reply provided, thus I will put this case to close pending. Please post a response in the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you with your follow-up questions.
Regards,
Nurina
P/S: If you like my comment, feel free to give Kudos. If my comment solved your problem, feel free to accept my comment as solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2Nurina
I'm really appreciate for help.
1. Logic lock is overkill. It should work without such a deep intervetion. It's a simple RAM, come on!
2. I've found some logic between data/addr register and FPGA ports. Totally forgot about that. So I've removed this logic (strictly speaking, I've moved it behind registers), and now reading is perfect. I've even removed set_max_delay lines in SDC. This is good, because I don't understand what they did anyway )
3. But then a new problem arrived: in some cases write is failing, and I don't understand why. Maybe again it is logic on the nWR line, idk. I can't remove this logic right now, will try some workaround.
4. About this topic... I've asked about constrains for static RAM, delays is a secondary problem, it should be resolved by correct SDC. I'm now reading this, it is much more close to what I need. Hope it will help. If you want to close this topic - it's ok.
ps One more question: are there any major differences between Cyclone III and IV?
I mean, i/o buffers or smth. Because I did something similar couple years ago on Cyclone IV, and have no problems at all without any constrains, fast registers etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, it seems to me I've forced it to work.
I've assigned ADDR pins to fast output, and DATA pins to fast input/output.
Then, I've created a new clock with PLL, it's the same 50 MHz, but with 90 degree shift. With this clock I've moved the nWR pulse a little forward.
Idk what have actually helped - the fast registers thing or shifted clock, but for now I can't reproduce any errors.
One thing I can't figure out: why signaltap don't show me the phase shift?
Here is the screenshot, capture on 200 MHz from PLL, trigger on falling edge of nWR.
add: seems it is shifted clock. I've tried to switch nWR back to the 0 degree one, and got errors.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page