Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12589 Discussions

Internal Register File (Custom Instructions)

Altera_Forum
Honored Contributor II
1,139 Views

Hello. 

 

I'm a (very) beginner of NiosII and VHDL, and have just managed to make a 

sample custom instruction (Combinatorial) like the following, 

thanks to ug_nios2_custom_instruction.pdf. 

 

unsigned char input[BUFF_SIZE], output[BUFF_SIZE]; 

for(i=0; input!='\0'; i++)  

output = (unsigned char) ALT_CI_PROC_ONE_CHAR((int)input[i]); 

 

Now, I'd like to extend the instruction to deal with the strings instead of chars. 

I mean, something like the following is preferable. 

 

unsigned char input[BUFF_SIZE], output[BUFF_SIZE]; 

ALT_CI_PROC_STRING(input, output); 

 

To do this, I should use Custom Instructions(Internal Register File), right? 

Could anyone tell me where I can get the examples of VHDL for this?  

 

Or ... any other suggestions are welcome. 

 

Thanks. 

 

shino
0 Kudos
11 Replies
Altera_Forum
Honored Contributor II
390 Views

Hi. 

 

When I was writing the original question, I was assuming that I can  

read/write the memory easily if I know the memory's address. 

 

I was wrong. 

 

Recently someone kindly gave me advice on this. I should turn my 

VHDL code into a component which connects to the Avalon Bus.  

 

So, I'd like to make this new component for SOPC Builder, but 

I've never done this before.  

 

Could anyone tell me where I should look into? 

If there are VHDL sample files, it would be great, 

but I'll be deeply grateful for any suggestion. 

 

cheers, 

 

shino
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

Well, what you're asking to do isn't exactly easy, so make sure you really need this to run that fast. If not, I'd recommend sticking with a custom instruction and using a loop. 

 

If so... then you want to make a bus-mastering Avalon bus peripheral. There's a PDF file on Altera's site called "Avalon Bus Specification" you should check out first. This should tell you enough that you could write a VHDL file which has all the peripheral's logic. You'll probably need at least an Avalon Slave (registers) interface to let the CPU control this thing (and let it interrupt the CPU when it's done), and an Avalon Master port so it can read and write memory. Test it out in Altera's simulator by itself to make sure it does what you expect. 

 

Once you have the component's VHDL file, use it to create a new component in SOPC Builder, and add it to your system. 

 

Once again, this is the hard way, but it has the potential to do up to one word every two clocks, so make sure you really need the speed.
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

Dear Mike, 

 

Thank you very much for your comment. 

I didn't realize this could become such a difficult task. 

 

Do I need this to run that fast? Well..., I'm not sure. 

Probably I could pick up several other functions for speeding up (= by turning 

them into VHDL) to achieve the total speed, but these other functions may  

have other difficult problems such as "recursive part", "structures", ... etc. 

 

So, maybe I should stick to this approach a little longer, and see how far I  

can go. Anyway, in this stage, any experience (know-how) should be valued. 

 

> There's a PDF file on Altera's site called "Avalon Bus Specification"  

> you should check out first. This should tell you enough that you could  

> write a VHDL file which has all the peripheral's logic. 

 

Thank you.  

I've downloaded the file. I'll tackle it. (At least, I think I can try...) 

 

By the way, when I have questions while I'm trying to make a bus-mastering  

Avalon bus peripheral, do you think I can ask here? Or "General Discussion  

Forum" would be more suitable? 

 

Cheers, 

 

shino
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

 

--- Quote Start ---  

originally posted by shino+oct 6 2005, 04:34 am--><div class='quotetop'>quote (shino @ oct 6 2005, 04:34 am)</div> 

--- quote start ---  

do i need this to run that fast?  well..., i&#39;m not sure.[/b] 

--- quote end ---  

 

usually you can figure it out from project requirements. or you can build the non-state-machine version and see if it&#39;s fast enough. 

 

<!--quotebegin-shino@Oct 6 2005, 04:34 AM 

by the way, when i have questions while i&#39;m trying to make a bus-mastering avalon bus peripheral, do you think i can ask here?  or "general discussion forum" would be more suitable? 

--- Quote End ---  

 

I&#39;d put VHDL/Verilog/Avalon questions in the General Discussion Forum, and IDE/HAL/C/C++ questions here.
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

Dear Mike, 

 

Thank you again for your kind remarks. 

 

> >Do I need this to run that fast? Well..., I&#39;m not sure. 

> Usually you can figure it out from project requirements.  

> Or you can build the non-state-machine version and see if it&#39;s fast enough. 

 

What I meant by "not sure" was the following. 

(If I had caused any confusion, I&#39;m sorry.) 

 

Suppose I want to speed up (10-20%) one project, which consists  

of hundreds c-coded functions. Profiling the project told me that  

function1 occupies 10% of the project. When I&#39;ve turned function1  

into "VHDL+loop", the new function1 achieved a fourfold speedup,  

but as a total, it&#39;s only 7.5% (0.1 x 0.75 = 0.075) speedup. 

 

So, I have to either 

a) turn function1 into VHDL completely 

and/or  

b) turn other functions into VHDL 

as a next step. 

 

Because I knew there were many difficulties in b), I just thought 

I should try a), but after reading your comments, I became unsure. 

 

And, the good thing is that this is my first exercise on converting c- 

to-VHDL, and I&#39;m allowed to take a little "detour". 

 

> I&#39;d put VHDL/Verilog/Avalon questions in the General Discussion Forum, and IDE/HAL/C/C++ questions here. 

 

Thanks. I&#39;ll follow your advice. 

 

Cheers, 

 

shino
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

 

--- Quote Start ---  

originally posted by shino@Oct 7 2005, 01:06 AM 

suppose i want to speed up (10-20%) one project, which consists  

of hundreds c-coded functions. profiling the project told me that  

function1 occupies 10% of the project. when i&#39;ve turned function1  

into "vhdl+loop", the new function1 achieved a fourfold speedup,  

but as a total, it&#39;s only 7.5% (0.1 x 0.75 = 0.075) speedup. 

--- Quote End ---  

 

And, if you somehow optimized function1 to take zero execution time(!), you&#39;d still only get a 10% speedup. Is this function1 at 10% the heaviest-use function in the system? 

 

Honestly, and I say this as a guess since I don&#39;t know much about what you&#39;re doing, it looks like you might want to figure out a way to speed up the system clock, since it doesn&#39;t sound like your processing is concentrated in any one place. That or increase caches. 

 

Alternatively, if you think you can keep track of things and have room on the chip, you might want to add another Nios2 processor, with just on-chip RAM or something, and use it as a coprocessor. There&#39;s a Mutex peripheral for synchronizing multiple CPUs like that. 

 

Just some alternatives to think about.
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

I agree with Mike. Unfortunately without seeing the system as a whole I can&#39;t really suggest anything that hasn&#39;t already been said earlier. I would look through the code to see if you have any calculations (not just functions) that are used often that do not run fast on your system (maybe they lend themselves well to a custom instruction). Putting the loop into hardware will give you more speedup, but you said it yourself that this is only 10% of your time so at best you&#39;ll only see a 10% speedup.

0 Kudos
Altera_Forum
Honored Contributor II
390 Views

Dear Mike and BadOmen, 

 

Thank you very much for teaching me many possibilities. 

 

Quote(Mike DeSimone @ Posted Oct 7 2005, 11:02 AM) 

> And, if you somehow optimized function1 to take zero execution time(!),  

> you&#39;d still only get a 10% speedup.  

 

Yes, it&#39;s true...very true... but still... 

I thought I could get much speedup, let&#39;s say, 95% speedup by  

turning whole function1 into VHDL. When I can get 9.5% (0.1 x 0.95)  

speedup by function1, it would make the rest considerably easier. 

 

Well..., was I too optimistic? Do you think 95% speedup unusual? 

It wouldn&#39;t be worth even trying?  

 

I&#39;m now reading "Avalon Bus Specification". Thank you for mentioning 

"Avalon Slave (registers)", "Avalon Master port", and so on. 

This mentioning is really helping me. So far, "Streaming Transfer" 

looks promising, although obviously I need (much) more reading. 

 

Quote(Mike DeSimone @ Posted Oct 7 2005, 11:02 AM) 

> Is this function1 at 10% the heaviest-use function in the system? 

 

Yes. This function1 is most frequently used. In addition to that, 

it&#39;s far simpler than other functions. Because these functions were  

all written by other people, simplicity is very important factor to me  

when rewriting. 

 

As for speeding up the system clock or multi-processor are not 

the options at this moment. But the fact your recommending these 

rather than "Avalon Bus component" is interesting. It means 

that they are more preferable for the experts like you, right? 

 

Quote(BadOmen @ Posted Oct 7 2005, 12:38 PM) 

> I would look through the code to see if you have any calculations  

> (not just functions) that are used often that do not run fast on your  

> system (maybe they lend themselves well to a custom instruction) 

 

Ah, right. "calculations(not just functions)" yes. 

I overlooked the basic fact that even when a function has some  

difficulties such as "recursive part", "structures", other parts of the 

function can be turned into VHDL. In fact, the status of function1 

is exactly this... but still, I overlooked it. What a tunnel vision! 

It must be worth checking. Thank you for pointing it out. 

 

Cheers, 

 

shino
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

 

--- Quote Start ---  

originally posted by shino+oct 11 2005, 03:23 am--><div class='quotetop'>quote (shino @ oct 11 2005, 03:23 am)</div> 

--- quote start ---  

well..., was i too optimistic? do you think 95% speedup unusual? 

it wouldn&#39;t be worth even trying?[/b] 

--- quote end ---  

 

i simply have no idea. there isn&#39;t enough info here on what you&#39;re really[/i] trying to do to answer that. 

 

<!--QuoteBegin-shino[/i]@Oct 11 2005, 03:23 AM 

so far, "streaming transfer" 

looks promising, although obviously i need (much) more reading. 

--- Quote End ---  

 

You might not need the streaming stuff. I haven&#39;t needed it yet, myself. 

 

"Streaming transfers" are for when the master doesn&#39;t know how much data it needs to move, but the slave does. I&#39;ve always transferred things in fixed amounts, so I&#39;ve never used streaming. 

 

 

--- Quote Start ---  

originally posted by shino+oct 11 2005, 03:23 am--><div class='quotetop'>quote (shino @ oct 11 2005, 03:23 am)</div> 

--- quote start ---  

because these functions were  

all written by other people, simplicity is very important factor to me  

when rewriting.[/b] 

--- quote end ---  

 

agreed. 

 

<!--quotebegin-shino@Oct 11 2005, 03:23 AM 

as for speeding up the system clock or multi-processor are not 

the options at this moment. but the fact your recommending these 

rather than "avalon bus component" is interesting. it means 

that they are more preferable for the experts like you, right? 

--- Quote End ---  

 

It means that they are simpler options. Speeding up the clock is a matter of making sure the chip can go that fast, and either changing some PLL settings or a clock oscillator. Interfacing with other components can still be done at the slower speeds now that SOPC Builder supports multiple clocks. Writing your own Avalon component is just more work. Tedious more than difficult. Also, you can&#39;t use the debugger to debug Avalon components&#39; internals; you have use something like SignalTap.
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

I have the same problem and need to have some example of doing that. Can we include a new macro function in C header file for setting register indexes or is there any other way of doing this using C language? 

 

Thanks in advance.
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

Not sure why you added to an old thread. 

 

You probably want something like: 

 

/* There are 'int __builtin_custom_inii(int op, int a, int b)' * (and similar) wrappers for custom instructions defined by gcc itself. * But none for the 'c' variants that do not use the main register file. * The one below is useful when the 'b' field is used as a sub-opcode.*/ __attribute__((always_inline)) static __inline__ uint32_t custom_inic(const uint32_t op, uint32_t a, const uint32_t b) { uint32_t result; __asm__ ( "custom\t%1, %0, %2, c%3" : "=r" (result) : "n" (op), "r" (a), "n" (b)); return result; } 

 

Read the gcc documentation for __asm__ for more details. 

If your instruction saves state, then you probably need __asm__ volatile.
0 Kudos
Reply