Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

Tightly Coupled Memory

Altera_Forum
Honored Contributor II
1,963 Views

Would someone please explain "tightly coupled memory" and its use within the NiosII/Qsys context? 

 

The exercise manual section of the "Designing with the Nios II Processor and Qsys" Customer Training handout (A-MNL-NII-QSYS-1DAY-11-0-v1) states that s1 of the On-Chip Memory (dual-port) should be connected to the Nios II tightly_coupled_instruction_master_0 interface and s2 of the On-Chip Memory should be connected to the Nios II data_master interface. 

 

HOWEVER, the Qsys connection diagram on the same page shows s2 of the On-Chip Memory connected to the Nios II instruction_master interface. 

 

So... which one is correct, and why do I need the tightly-coupled instruction master anyway? 

 

Should I also include a tightly-coupled data master if I use a data cache? 

 

 

BTW - I discovered previously that I need a data cache in the Nios II to properly interface with my 16-bit wide SRAM chip.
0 Kudos
8 Replies
Altera_Forum
Honored Contributor II
560 Views

I don't have here the diagram you refer to. Anyway I think the first connection is correct: port s1 connected to TC instruction master and s2 to data master. 

Connection to data master is mandatory to allow loading memory upon boot. 

Then, usually only port s1 is used for fetching execution code. 

 

Regarding the other question, TC memory and cache are independent devices: you can have either or both. 

You can consider TCM very similar to cache, from the point of view of performance, since both rely on a dedicated data transfer channel which is not subject to delays due to bus arbitration. The difference is that cache has variable contents, changed automatically depending of code being executed; while TCM content is fixed but you can decide what it must be loaded into. 

So usually TCM is convenient if you have a few functions or data frequently accessed, making them very very fast.
0 Kudos
Altera_Forum
Honored Contributor II
560 Views

Thanks for the quick reply Cris72. This is exactly what I needed to know and helps with my understanding of TCM.

0 Kudos
Altera_Forum
Honored Contributor II
560 Views

If you are running with normal data in 'tightly coupled' memory, then you also want to avoid data access to the code memory during normal running as these will be slow Avalon cycles (especially if you don't have a data cache). 

 

There are two cases where the instruction memory might end up containing data. 

1) readonly data. 

2) switch statement jump tables. 

 

Readonly data is relatively easily moved by changing the linker script. 

 

gcc3 puts the switch statement jump tables in their own segment, but the Altera build of gcc4 has them hard-coded into the text segment, this is rather fubar. I think it was done because of issues with the available relocation types for shared libraries (position independant code). 

Fixing this need a compiler rebuild.
0 Kudos
Altera_Forum
Honored Contributor II
560 Views

Thanks DSL. Not sure I understand all of this, but it sounds best to not use tightly coupled data memory unless I have a very specific need for it.

0 Kudos
Altera_Forum
Honored Contributor II
560 Views

 

--- Quote Start ---  

Thanks DSL. Not sure I understand all of this, but it sounds best to not use tightly coupled data memory unless I have a very specific need for it. 

--- Quote End ---  

 

I guess you missed the point.  

What dsl meant is that if you move code to TCM you might still have a few accesses to normal memory (the one connected to slower Avalon bus) because of the mentioned gcc behavior. But this is only an issue for speed performance, which could be slightly reduced in comparision to the expected one. There will be no problem at all with your program operation; functionality wll be the same whether you place code into TCM or in normal memory: compiler and linker take care to make all the job. 

Don't mind using TCM, especially if you have DMA or other master devices. 

 

Regards.
0 Kudos
Altera_Forum
Honored Contributor II
560 Views

To re-iterate, if your code/data is going to be in internal fpga memory then, if possible, make the nios access it as tightly coupled code/data. That will give you faster accesses and use less fpga resource. 

 

You will probably still need the instruction cache for the boot code (for boot from flash or jtag) - but you can make it minimally sized. 

 

The code I've written is very performance critical - 'hard real time'. I had to ensure the worst-case code paths didn't exceed the available clock count.
0 Kudos
Altera_Forum
Honored Contributor II
560 Views

Indeed, I did miss the point. Thank you Cris72 for the clarification. I thought DSL was advising against using TCM. 

 

From you comments and DSL's last post I better understand how TCM can be used to speed up a design and reduce the resource load on the FPGA. There is a lot that I still do not understand about this, such as how does one control what code goes into TCM and what code goes into normal memory. I need to study the subject a bit more if I can find documentation on TCM in respect to Nios II.
0 Kudos
Altera_Forum
Honored Contributor II
560 Views

You can use the gcc __attribute__((section("section_name"))) to mark a function (or data) as belonging to specific section instead of the default (eg .code). You can then use the linker script to assign different sections to different memeory areas. I think the the Altera build tools have some support for this - I don't use them at all. 

 

If you have some code in SDRAM (or similar) then you could consider that code in tightly coupled memory is locked into the instruction cache. Since the access times are similar. 

 

Depending on the size of your target application, and the fpga you are using, you might fit all the code into internal memory. For small applications it is usually the inclusion of stdio (for diagnostic printf() to the jtag console) that consumes all the memory.
0 Kudos
Reply