Tightly coupled memory and C2H

Altera_Forum · ‎12-31-2006

I have a NIOS application for which using tightly coupled, on-chip memory appears to be offering a nice performance boost. I have a large array of values that I like to reference periodically and rather than chewing up cache space or thrashing the cache every time I want to use the array, I just put it in a unique, tightly-couple, on-chip RAM and it's working super.

Recently I've also started looking at C2H as a way to speed up parts of the code and again, seeing some nice impact in some focussed, repetitive looping situations.

However, I seem to have hit a snag when I try to combine the two together.

If I create a C2H routine that refers to the tightly coupled memory, the program just hangs. I can use the "software implimentation" of the C2H routine and it runs fine. Likewise I have other C2H routines that access traditional off-chip program memory and they run fine. It's just the C2H accessing the tightly-coupled, on-chip that seems to be an issue.

Has anyone tried this yet? Any suggestions?

Many thanks

Altera_Forum · ‎01-02-2007

I've never tried that before but it should work. Be sure to dual port the tightly coupled memory and use a connection pragma so that C2H masters the 2nd memory port. By definition tightly coupled memory ports must be a 1:1 connection with a tightly coupled Nios II master. So that means you can't have C2H accessing the same memory port that the Nios II tightly coupled master connects to.

Nios II <---------------------->Port s1 of the dual port memory (read latency 1)

C2H (using pragmas) <----->Port s2 of the dual port memory (read latency of 1 or 2)

By the way, if some of you are wondering how to eliminate some of the overhead of calling the C2H generated hardware accelerator, this is one method since you don't have to flush the data cache (assuming this is all the shared memory your accelerator needs to access).

I hope that helps,

JCJB

Altera_Forum · ‎01-02-2007

JCJB -

Thanks, I think that will help a lot. Clearly the issue that I was struggling with was that the first port on the TCM had a 1:1 connection with NIOS which blocked any other access. I can see where dual port would offer then 2nd path in that I need.

That said just two more (hopefully simple) questions. I went looking in all the Altera documentation but came up dry.

1) When I create the 2nd slave port on the tightly coupled memory, what do I connect it to in the SOPC builder? Or do I leave it unconnected as I'm going to reference it in C2H?

2) I'm not familiar with pragma syntax. Any suggestions on what it would be or where I could go to get some documenation on it?

http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif

V

Altera_Forum · ‎01-03-2007

1) Yes just leave it disconnected, C2H will connect it automatically assuming you have the pragma statement setup properly (see# 2).

2) # pragma altera_accelerate connect_variable <function_name>/<variable_name> to <memory_name>/<port name>

ex.

# pragma altera_accelerate connect_variable foo/a_ptr to tcm/s2

void foo(int * a_ptr)

{

int i;

*a_ptr = 0;

for(i = 0; i < 1024; i++)

{

*a_prt += i; // dereference pointer "a_prt" and accumulate "i" into it (doesn't make sense to access main memory to accumulate variable i)

}

return *a_prt;

}

Not that this does anything useful but this example tells the C2H compiler that "a_ptr" points to data located in a memory called "tcm" and connect to the memory port named "s2" (port of tcm). By default without this pragma, C2H connects to all slave ports (ports not just memories) that the Nios II data master connects to. This pragma is important for controlling the number of slave ports C2H will connect variables to. You can reuse the pragma statement on the same variable but different memory ports as well. So if you want to connect a variable in a C2H accelerator to 2 out of 10 memory ports in your system, this pragma gives you this control (use it twice).

With that said, if you use this pragma statement be sure that the data being accessed indeed resides in memory you use in the connection pragma. As a rule of thumb I recommend leaving the pragma statements out of your code while prototyping your algorithm (reduce your clock frequency), then add in the pragma statements to optimize your master:slave connections when you are finalizing the design (then increase your clock frequency).

These documents will go into more details (read them in this order to get the most out of them):

http://www.altera.com/literature/ug/ug_nio...2h_compiler.pdf (http://www.altera.com/literature/ug/ug_nios2_c2h_compiler.pdf) (user guide)

http://www.altera.com/literature/an/an420.pdf (http://www.altera.com/literature/an/an420.pdf) (optimization guide)

http://www.altera.com/literature/an/an417.pdf (http://www.altera.com/literature/an/an417.pdf) (SG-DMA example)

http://www.altera.com/literature/tt/tt_nio...ng_tutorial.pdf (http://www.altera.com/literature/tt/tt_nios2_c2h_accelerating_tutorial.pdf) (FFT example)

These are located here:

http://www.altera.com/literature/lit-nio2.jsp (http://www.altera.com/literature/lit-nio2.jsp)

There are also two other examples currently (image rotate, and FIR) located here:

http://www.altera.com/support/examples/nios2/exm-nios2.html (http://www.altera.com/support/examples/nios2/exm-nios2.html)

I hope that helps.