Hi,Is it possible to create a custom instruction that uses more than two inputs, or alternatively load all the inputs serially into registers I have implemented on my hardware? I am creating a piece of hardware to sort an array and I would like to load all the contents of the array into registers I have in my hardware. Thanks
A custom instruction is limited to two 32bit inputs (and one output) because of the design of the nios cpu - that can't change.What you do with the values and the other other register fields is rather up to your imagination... If you write a non-combinatorial instruction (and make sure that only one thread uses the hardware at a time) you can save state between instructions. So if 'writerc' is 0, you can use the 'C' bits as a sub-opcode (rather than a register number) - so maybe 'load first', 'load next', 'load last' ? Similarly you can use 'readrb' of 0 to indicate other special actions. However loading data via the nios is probably slow - if you really need a sort accelerator you probably need to give it an Avalom master interface so it can directly access the data in memory.
Perhaps this would work best for for you:- Custom hardware accelerator with two masters (one read and one write) - Dual port on-chip RAM with the accelerator masters connected each port - Slave port that you poke new data into the sorted list - Each time a new value comes in you start at the *end* of your list searching backwards to find the spot to insert it - While the step above is happening you use the write port to start moving the read data one memory location further in memory You could also reverse the data order and work your way up in memory addresses too (that would work better if you decided to put your data into SDRAM for example). This will only work efficiently up to a certain point. If you are talking about sorting really big lists of data there are better ways to do this. What I suggest above is O(n) whereas there are O(log2(n)) ways to do this sort of thing which would scale better as the data set grows. Technically you could do this with a custom instruction but you are limiting your possiblities that way, at least with a masting component you can put your data in any memory you want.
Up on the Altera documentation page go to the Nios II section and there should be a document called "custom instruction user guide"... or something along those lines. I suspect that docs like that might get sucked into the main software developer manual at some point so if someone reads this a year from now .... well now you know where it went :)Like DSL and I mentioned in previous notes, if your hardware has a lot of inputs it probably makes more sense for those to live in memory and have your hardware master that memory instead. Custom instructions are good for quick computations with a small data set but they'll have diminishing returns of your software spends all it's time shoving operands to the custom instruction through a series of calls to the hardware.
I am doing edge dedection on nios2. I have done it without using custom instruction. now i want to do it with custom instruction and i want to pass 8 inputs to custom instruction. I am not getting enough material like what to do when you have multiple inputs . So can you give me little bit guidance regarding this _?Thanks
You'll either need to use the internal register file or consume multiple custom instructions to pass the information. I don't recall the internal register file having macro support so I typically use use the 'n' bus input to perform different operations with the same hardware. For example n = 0 you can pass the first two input, n = 1 you can pass the next to inputs, etc.... n = 3 pass the last two inputs and execute the edge detect and return a value. Again the custom instruction user guide documents all this stuff so I would read it since we aren't going to type 40+ pages worth of information into a forum post.Edge detection is typically something you operate across an entire frame so performing this operation with a series of custom instruction calls seems inefficient to me. I would have an edge detection block that fetches the entire frame from memory and performs all the operations at once then returns the result to the processor instead of having the processor sending this work to the hardware a piece at a time.