Nios® II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.
12489 Discussions

custom instruction: internal register file mode

Honored Contributor II



I need use niosII custom intructions to do follow things:  


inputs two 128bit data to custom logic and uses custom logic to do some operation, and then outputs 128bit result 


since nios CI dataa/datab port is 32bit, so I want to use the internal register file mode(could others method to do it?), but the function interface in system.h is only supplied n, dataa, datab, ( __builtin_custom_inii(n,A,B) ), how can I assign values to readra/readrb/writerc? 


thanks for your help.
0 Kudos
7 Replies
Honored Contributor II

Write your own asm wrapper - then you know exactly which opcode is generated. 

AFAICT a custom instruction has made available to it (in various fields) the entire 32bit instruction and the two 32bit values from the register file for the 'A' and 'B' registers. 


The 'readra' and 'readrb' bits could be used (when zero) to avoid a pipeline stall if either register has been recently written - but I very much doubt that is done. All other instructions (except call and jmpi) can stall on the 'A' field, and on the 'B' field if the low two bits of the opcode are the same (dunno why this isn't bit0!). Special casing 'custom' seems unlikely - I've not experimented! (I've not looked to see if call/jmpi stall either) 


The 'writerc' field is used to control the writeback to the main register file, this isn't needed until the execute phase so always depends on the opcode. 


I have written custom instructions that use the 'B' field as a sub-opcode, substiting the low 5 bits of rB when readrb is set! 


I seem to have the following lurking: 


/* There are 'int __builtin_custom_inii(int op, int a, int b)' * (and similar) wrappers for custom instructions defined by gcc itself. * But none for the 'c' variants that do not use the main register file. * The one below is useful when the 'b' field is used as a sub-opcode.*/ static __inline__ uint32_t custom_inic(const uint32_t op, uint32_t a, const uint32_t b) { uint32_t result; __asm__ ( "custom\t%1, %0, %2, c%3" : "=r" (result) : "n" (op), "r" (a), "n" (b)); return result; } /* We have a custom opcode available for byteswap and similar */ # define bswap32_fn(x) ((((x) & 0xff) << 24) | (((x) & 0xff00) << 8) | (((x) & 0xff0000) >> 8) | (((x) & 0xff000000) >> 24))# define bswap16_fn(x) ((((x) & 0xff) << 8) | (((x) & 0xff00) >> 8)) # ifdef NIOS_BIT_MANGLER_OPCODE /* Probably number 0 */# define bswap32_ci(x) custom_inic(NIOS_BIT_MANGLER_OPCODE, (x), 1)# define bswap16_ci(x) custom_inic(NIOS_BIT_MANGLER_OPCODE, (x), 2)# define brev32(x) custom_inic(NIOS_BIT_MANGLER_OPCODE, (x), 4)# define brev16(x) custom_inic(NIOS_BIT_MANGLER_OPCODE, (x), 8)# define brev8(x) custom_inic(NIOS_BIT_MANGLER_OPCODE, (x), 16)# else# define bswap32_ci(x) bswap32_fn(x)# define bswap16_ci(x) bswap16_fn(x)# endif # define bswap32(x) (__builtin_constant_p(x) ? bswap32_fn(x) : bswap32_ci(x))# define bswap16(x) (__builtin_constant_p(x) ? bswap16_fn(x) : bswap16_ci(x)) 


You probably want 'asm volatile' to ensure the instructions are added in the correct order.
Honored Contributor II

Hi, DSL, Thank you for your helpness and so fast reply.  


I have no deep experience in this field, Your reply looked a little difficulty.I need some time to undstand it....I thought it perhaps general settings problem previously, but now there seems still a lot work to do  


thanks again
Honored Contributor II

You could also just avoid the custom instruction method altogether, slap a slave port on your hardware, and just access it with the Nios II data master. Then if you want to work on data vectors in memory you can just pump data in using a DMA (this would require separate operand and result slave ports).

Honored Contributor II

Thanks for your reply, BadOmen, I have implemented a coprocessor for this topic, but for comparision and other reasons, I have to use custom instruction to achieve it again..... 


I think I do not carefully read ug_nios2_custom_instruction.pdf, chapter 2 gives some guidance, but i still not clear how to combine custom assemblly software interface to niosII application. when i solve this problem , i will give my solutions here for all to judge it . 


Honored Contributor II

Ah I see. This would be worth reading, it'll show you how to map C variables into the assembly code: This ensures that you don't accidentally clobber registers used by the rest of your C code. 


Also make sure you don't need to support preemption with your custom instruction. Since you have to make multiple calls to the custom instruction (due to your 128-bit data) there could be cases where you need to ensure preemption. Cases such as using your custom instruction in an ISR or if you are running an OS you might be in the middle of moving data to/from the custom instruction and the process becomes preempted. To support this you would have to add additional instructions in your custom instruction to allow the state to be stored off and restored used by the exception handler. If you are just benchmarking you are most likely running a single thread and don't use the custom instruction from within the ISR so I'll save myself some typing and leave it at that.... just something to keep in mind though.
Honored Contributor II


in your code, you used inline-assemble like this:  

__asm__ ( "custom\t%1, %0, %2, c%3"  

: "=r" (result) : "n" (op), "r" (a), "n" (b)); 


I think you use custom logic internal register c%3, this indexed by varible 'b'. I take the same method: 

__asm__ __volatile__("custom %1, c%0, %2, %3"  

:"=i"(0) :"i"(1),"r"(dataa),"r"(datab)); 

but niosII IDE compile report error:error: output number 0 not directly addressable 

env: quaruts9.1+sp2, niosII IDE 9.1 


and , if i constraint like this:  

__asm__ ("custom %0, c0, %1, %2\n": :"i"(1),"r"(dataa),"r"(datab)) 

it can pass compile, but be translated to : custom 1, ctl5, r3, r2; 


how can i point the internal register name in inline assemble? 


another question: 

signal a/b/c each can index 32 internal register, should I use a0-a31/b0-b31/c0-c31 as the internal register name in custom instruction assembly syntax? and use the same name in vhdl code ? if not , how can i correspond them
Honored Contributor II

The type of the C variable matters - the "n" (b) requires that 'b' be 'const int b'. 

You are seeing 'ctl5' due to a bug in the disassembler, see the bugs/patches on (after the wiki was moved it seems to have become difficult to sort out which patch fixes which bug!). 

Check the opcode word by hand. 


The foo.tcl file for one of my custom instructions contains: 


add_interface_port foo in_val dataa Input 32 add_interface_port foo indirect readrb Input 1 add_interface_port foo direct_action b Input 5 add_interface_port foo indirect_action datab Input 32 add_interface_port foo out_val result Output 32 


'b' is the 5-bit register number (taken directly from the instruction). 

'readrb' is the single bit from the instruction. 

'datab' is the 32bit value read from the register file. 

The 'datab' value will be read from the register file during instruction decode (or might be muxed from the ALU result of the previous 2 instructions in order to avoid a pipeline stall). 


This instruction actually uses 'readrb' to mux between 'b' and the low five bits of 'datab' in order to decide on the transformation from 'dataa' to 'result'.