Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
753 Views

custom instruction

Hello world; 

I have created my first custom IP for matrix multiplication, and i want to access it using a custom instruction, how can I make this?, my IP is compatible with the avalon bus interfaces. 

Thanks.
0 Kudos
5 Replies
Altera_Forum
Honored Contributor I
20 Views

The custom instruction interface limits you to two 32bit inputs and one 32bit output, it doesn't really seem a match for matrix multiply!

Altera_Forum
Honored Contributor I
20 Views

All you have to do is add another interface to your existing custom IP, with the new interface being a custom instruction slave. e.g. one custom instruction slave interface, and one or more Avalon interfaces for loading/storing your matrix. 

 

Then simply glue the opcodes to your desired behavior of your existing IP. The only real trick is that you need to be careful with cache coherency (since your existing Avalon interfaces will not being going through the NIOS cache, if you have one).
Altera_Forum
Honored Contributor I
20 Views

Given that the matrix multiply won't be quick, you really want an interface that will allow it to progress asynchronously. 

Also and extra couple of clocks to initiate the action won't make much difference. 

So you might as well request the mutiply using an Avalon slave. 

 

You should be able to dual-port an M9K memory block to your matrix code and one of the cpus tightly coupled data ports. That will give both sides single clock access and avoid any problems with the data cache.
Altera_Forum
Honored Contributor I
20 Views

 

--- Quote Start ---  

Given that the matrix multiply won't be quick, you really want an interface that will allow it to progress asynchronously. 

--- Quote End ---  

 

 

Both custom instruction and Avalon-MM support either asynchronous or synchronous styles of execution.
Altera_Forum
Honored Contributor I
20 Views

Moving this post to the Nios portion of the forum so that people will see it. 

 

I agree with DSL, for this sort of thing you don't want a highly latent matrix multiply stalling the processor pipeline. You can make a non-blocking custom instruction to get around this but if you are going to go through all that effort you might as well take the processor out of the loop and feed a matrix multiply IP with a DMA to maximize the throughput.