Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12627 Discussions

Synchronous SRAM - Instruction and Data memory

Altera_Forum
Honored Contributor II
2,071 Views

Hello, 

 

Has anyone implemented synchronous sram for Nios instruciton and data store (ie execute out of synch sram)? I'm sure it can be done, but I'm wondering about the performance hit for opening access, and non-sequential execution of instructions. The cache could play in here as far as actual performance. 

 

The avalon bus supports a feedback pin to indicate timing on the transfer. That is for transfers which are not always the same length. For example, the synch sram might take three cycles for an opening access, and then one cycle there-after (as long as the address are sequential). 

 

I am thinking that you will loose cycles in the "glue" that stitches the synch sram to the avalon bus, and thus make the performance not as good as the Asynchronous SRAM that is used on the Altera demo boards.  

 

Has anyone looked in detail at this? 

 

Thanks for any info. 

 

Eric Tauch
0 Kudos
11 Replies
Altera_Forum
Honored Contributor II
727 Views

I haven't done it yet, but I've got a Synchronous SRAM spec'd into the next spin of my board. After a few days of absorbing details, I decided on Synch SRAM with No Bus Latancy (NoBL also called Zero Bus Turnaround ZBT) with Flow-Through output. (as opposed to pipelined output) 

 

The ZBT is important if you're going to be switching between reading and writing often. (like moving memory within the same module) 

 

The Flow-Through is important as it presents data on every clock as opposed to pipeline that takes two clocks. In reality its almost a wash, because you can clock the pipelined chips almost twice as fast as the Flow-Through.  

 

You can of course make your controller latency aware and even take advantage of the SSRAM's Burst Modes, but in my case, I'm going to be using the ram for table lookups. This means the addresses will be presented in a random order so burst and I think pipelining won't do me any good. (plus I'd rather have a 100-133MHz bus vs. 250MHz bus for the same performance) 

 

FYI, I chose the Cypress CY7C1355B (256K x 36). It can clock upto 133MHz and is available from several sources for $15. 

 

I'm thinking I'll be able to setup fairly aggressive timing using IUL. 

 

Ken
0 Kudos
Altera_Forum
Honored Contributor II
727 Views

Thanks for your comments. 

 

So why do you want to use the synch ram for an implementation that is non-sequential? 

 

I guess cost could be a factor. 

 

An asynch at 10ns could give you 66MHz no wait state bus (rough numbers). Whereas a 133 synch running non sequential 

would take 3 clocks per access for 44MHz overall through-put. 

 

Just wondering what your reasons for the synch ram are. 

 

I was considering the latency aware implementation for instruction execution. I am thinking the the UIL glue will still eat up a clock (considering a state machine control circuit). Perhaps some sort of simple logic circuit would preclude loss of clocks. I will have to draw up some bus cycles to see. 

 

Thanks, 

Eric
0 Kudos
Altera_Forum
Honored Contributor II
727 Views

Eric, 

 

I'm glad you brought this up. There is still time to change if I'm reading the things wrong. 

 

The timing diagrams show that the contents at the address presented on one rising edge are available by the next rising edge. (6.5ns later) 

 

From the perspective of the Nios running off the same 100-133MHz clock how could I do better? 

 

The thing I'm not sure of, and this may be the source of one of your 3 clocks, is whether or not the Nios/Avalon can present the address in 0 clocks or do you essentially spend one clock getting the address setup?  

 

That is if I have 0 setup time in IUL and a LDW is executed, how many clocks until RD# and ADDR are both valid on a rising clock edge? 

If its the *next* one for synch vs. *this* one for asynch then that's one clock in asynch's favor.  

 

With Flowthrough I don't see how asynch has an advantage once the address is read in.  

 

Can you explain the source of the three clocks? 

 

Sorry if this is too basic, I&#39;m still learning basics here. http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif  

 

Thanks, 

Ken
0 Kudos
Altera_Forum
Honored Contributor II
727 Views

Hello, 

 

I think Avalon will present the address/control in one clock. The thing to look at is the timing diagram for the synch ram for an opening access (ie new address). The last parts I used required 3 clocks for an opening access or an address change. After that, an internal address counter takes over, and the device spits out data on every clock. If you move to another non-sequential location, another 3 clocks delay. 

 

Your parts may not work this way. 

 

Eric
0 Kudos
Altera_Forum
Honored Contributor II
727 Views

Eric, 

 

They don&#39;t. It&#39;s the No Bus Latency or Zero Bus Turnaround feature that eliminates that problem when changing addresses or Read/Write direction. The Flowthrough basically makes it Asynchronous on the back side. 

 

So unless someone else knows more, ZBT/FT SSRAM is good stuff. Here is my summary: 

 

Fmax < 100 MHz: Use Asynch SRAM (10ns) 

Fmax < 133 MHz: Use ZBT,FT SSRAM (6.5ns) 

Fmax > 133 MHz: Use ZBT, pipelined SSRAM (3ns) 

 

SSRAM will additionally perform much better for sequential access if a smart controller that supports Burst mode is used. Then you can get data in or out every 6.5ns (FT) or <3.5ns (pipelined) 

 

Actually, If I read the datasheet correctly, I could read random values every 6.5ns if the controller would present the next address to the chip as it&#39;s reading the data from the last address. (one clock before) I don&#39;t know if the IUL or any other SOPC IP will do this, however. 

 

Here&#39;s a link I found usefull to explain the basics. 

 

http://www.cypress.com/cfuploads/support/a...ntroduction.pdf (http://www.cypress.com/cfuploads/support/app_notes/introduction.pdf

 

I think $15 for a 1 MB 32bit memory chip I can read in 6.5ns is pretty good!  

 

 

Ken
0 Kudos
Altera_Forum
Honored Contributor II
727 Views

http://www.cypress.com/cfuploads/img/produ...s/cy7c1355b.pdf (http://www.cypress.com/cfuploads/img/products/cy7c1355b.pdf

 

page 26 has the read/write timing. 

 

This part does indeed not require the 3 clock latency. You still are on a pipelined access, where the write (or invert for read) gets clocked into the device, and then data is presented or clocked out on the next edge. 

 

The standard asynch nios cycle that I have seen, would want to apply the control signals following an edge, and then get the data on the next edge (ie access to an asynch SRAM).  

 

I have not run thru all the IUL permutations/possibilities, but this is the way I think the standard nios cycle will work (unless there is some sort of synch mode I havent seen). 

 

what is interesting about this mode is that you could run nios 0 wait state (no handshaking to determine latency) with the SRAM running with a phase delayed or perhaps even inverted (180 degree shift) clock. This will take some timing analysis to implement details. 

 

 

Thanks for the info, 

Eric
0 Kudos
Altera_Forum
Honored Contributor II
727 Views

PS:  

 

A couple other items I thought about. 

 

Its interesting that all the Altera demo boards use a bank of external asynch SRAM for nios store. You can probably get some extra performance from synch ram, but the million dollar question is -how much?- 

 

Then, if you think about it, the SRAMS that are implemented on chip to the FPGA are most likely all sychronous (may need to read a bit here). So the bus interface must work well with synch rams also.  

 

Most of my apps have had plenty of real time using asych ram at 25Mhz. So, I have not really looked at any benchmarks. 

 

I&#39;ll go take a look at whats on the Altera website. 

 

Eric
0 Kudos
Altera_Forum
Honored Contributor II
727 Views

Eric, 

 

Thanks for looking at the part. I think we are in agreement - It may take one or two clocks to retrieve data http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif  

 

Just depends on the capabilities of IUL. I&#39;ve looked at the timing in detail on the logic analyzer many times and the waveforms look just like the picture in the IUL Wizard, but what I don&#39;t know is exactly which clock the LDW was started on. If there is a one clock delay until the address is actually put on the bus and WE_ activates, etc. then we&#39;re at two clocks minimum. (of course this would apply to any memory, not just SSRAM) 

 

Thanks, 

Ken
0 Kudos
Altera_Forum
Honored Contributor II
727 Views

Hello, i&#39;am a green hand! 

I want to run the Nios on a DSP develop kit instead of a Nios demo board.  

But i encounter a problem as you do. The srams on the DSP develop kit are the synch srams. 

So I want to program a ssram controller by myself with HDL, but i don&#39;t know how to glue the Avalon bus and my logic together; and how to tell the Avalon bus my timing information. Can you tell me about your experience? 

 

Dazhi
0 Kudos
Altera_Forum
Honored Contributor II
727 Views

I don&#39;t think you create a ssarm controller yourself. You can just create an interface of user logic, in which you should choose the corresponding avalon bus port, and then set the correct setup/wait/hold/latency time requirements. Finally assign your IUL to the SSRAM pins.

0 Kudos
Altera_Forum
Honored Contributor II
727 Views

Hi, 

 

this file "class.ptf" allows to use the onboard SSRAM as instruction and data memory on DSP Development Board: 

# ## 

 

CLASS dspboard_ssram 

ASSOCIATED_FILES 

Add_Program = ""; 

Edit_Program = ""; 

Generator_Program = "--none--"; 

USER_INTERFACE 

USER_LABELS 

name="SSRAM on 1s25 DSP Board"; 

technology = "Memory"; 

MODULE_DEFAULTS 

class = "dspboard_ssram"; 

class_version = "2.0"; 

HDL_INFO 

# An interface to this memory requires no additional files 

# in the target project directory. 

SLAVE s1 

PORT_WIRING 

PORT A 

direction = "input"; 

is_shared = "1"; 

type = "address"; 

width = "18"; 

PORT D 

direction = "inout"; 

is_shared = "1"; 

type = "data"; 

width = "32"; 

PORT adsc_n 

direction = "input"; 

is_shared = "1"; 

type = "always0"; 

width = "1"; 

PORT adsp_n 

direction = "input"; 

is_shared = "1"; 

type = "outputenable_n"; 

width = "1"; 

PORT adv_n 

direction = "input"; 

is_shared = "1"; 

type = "always1"; 

width = "1"; 

PORT bw_n 

direction = "input"; 

is_shared = "1"; 

type = "writebyteenable_n"; 

width = "4"; 

PORT bwe_n 

direction = "input"; 

is_shared = "1"; 

type = "write_n"; 

width = "1"; 

PORT clk 

direction = "input"; 

is_shared = "1"; 

type = "clk"; 

width = "1"; 

PORT chipselect_n 

direction = "input"; 

type = "chipselect_n"; 

width = "1"; 

PORT mode 

direction = "input"; 

is_shared = "1"; 

type = "always0"; 

width = "1"; 

PORT oe_n 

direction = "input"; 

is_shared = "1"; 

type = "outputenable_n"; 

width = "1"; 

SYSTEM_BUILDER_INFO 

Active_CS_Through_Read_Latency = "1"; 

Address_Alignment = "dynamic"; 

Address_Width = "18"; 

Base_Address = "--unknown--"; 

Bus_Type = "avalon_tristate"; 

Data_Width = "32"; 

Has_IRQ = "0"; 

IRQ_Number = "N/A"; 

Is_Memory_Device = "1"; 

Read_Latency = "1"; 

Read_Wait_States = "0"; 

Write_Wait_States = "0"; 

Setup_Time = "1"; 

Hold_Time = "0"; 

SYSTEM_BUILDER_INFO 

Is_Enabled = "1"; 

Instantiate_In_System_Module = "0"; 

 

# ## 

 

Hope that helps! 

 

Jochen 

http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif
0 Kudos
Reply