Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20761 Discussions

custom instructions implementation

Altera_Forum
Honored Contributor II
1,040 Views

Hi everybody, 

 

I'm trying to implement a mac unit as custom instruction into the Nios2 to multiply 2 operands and add another. I had no problems describing it in vhdl. Unfortunately I'm stuck now at several other problems. 

 

1. In sopc builder after adding the mac component to the cpu and generating the .ptf file, i build a syslib in Nios2ide using that .ptf.  

At the beginning of the building process it always says  

 

--- Quote Start ---  

WARNING: module cpu_II_Mac_inst (Mac) not found in component directory (install.ptf) 

 

--- Quote End ---  

 

What does that mean? What should I add to the install.ptf or the component directory (which is where?) to help him find the module? 

 

 

2. I need 3 operands - the system.h created anyway by the Nios2ide only lists a macro with 2 operands even if I've set the operands of the custom_instruction_slave to 3 in the component editor  

 

--- Quote Start ---  

#define ALT_CI_MAC_INST(A,B) __builtin_custom_inii(ALT_CI_MAC_INST_N,(A),(B))  

--- Quote End ---  

 

 

 

3. If I edit that macro in the system.h file to support 3 integer operands 

 

--- Quote Start ---  

#define ALT_CI_MAC_INST(A,B,c) __builtin_custom_iniii(ALT_CI_MAC_INST_N,(A),(B),(c))  

--- Quote End ---  

 

I get an undefined reference to that function. How do I add the functionality for a third operand? Where are the built-in functions declared and where is the function body, so that I may add a third operand to that function manually? 

 

How do I tell the compiler to run these function calls on the hardware described by the hdl-file? 

 

I would deeply appreciate any help on this topic because I'm about to get desperate on it. 

 

Cheers, 

Dash
0 Kudos
4 Replies
Altera_Forum
Honored Contributor II
310 Views

Dash, 

 

1. I don't know 

2. Maybe related to your VHDL description. If it is not a big secret post it, or send me a message. 

3. The compiler itself will never use your custom instructions to optimise code. Because the compiler is not build with this information. You have to change and rebuild the compiler for that (what you propably don't want to do). 

The only code that will use the custom instruction is if you call the macro. 

(there is an exception for flating point instructions, where altera build support for it into the compiler, it is however a job for specialists to do that). 

 

Stefaan
0 Kudos
Altera_Forum
Honored Contributor II
310 Views

Stefaan, 

 

thank you for your reply. I haven't been at home over the weekend so that I wasn't able to write here. 

My vhdl code isn't that complex to explain the ide's behaviour. 

I'll show you: 

 

--- Quote Start ---  

 

library IEEE; 

use IEEE.STD_LOGIC_1164.ALL; 

use IEEE.STD_LOGIC_ARITH.ALL; 

use IEEE.STD_LOGIC_UNSIGNED.ALL; 

library WORK; 

 

entity mac is 

Port ( A : in STD_LOGIC_VECTOR(15 downto 0); 

B : in STD_LOGIC_VECTOR(15 downto 0); 

C : in STD_LOGIC_VECTOR(31 downto 0); 

Q : out STD_LOGIC_VECTOR(31 downto 0); 

clk : in STD_LOGIC); 

end mac; 

 

architecture Behavioral of mac is 

 

signal Qs : STD_LOGIC_VECTOR(31 downto 0); 

 

begin 

 

MAC : Process (CLK) 

Begin 

If CLK'event and CLK = '1' Then 

Qs <= A*B; 

End If; 

End Process; 

 

Q <= Qs + C; 

 

end Behavioral; 

 

--- Quote End ---  

 

I guess I'll just try around a little. Maybe I'm lucky to find sth. 

Thanks anyway again for the help. 

 

Cheers, 

Dash
0 Kudos
Altera_Forum
Honored Contributor II
310 Views

Dash25,  

 

You can not make custom instructions with 3 inputs like you descibe it for a MAC. 

 

What you need to do is to work with 2 custom instructions.  

 

One will access the accumulator register, the other will add the multiple of the two operands and add it to the result. Best performance you will have if you register the multiply output before adding.  

 

I'm not good in VHDL (because I never use it), but in verilog it should look as follow :  

 

module custom( input a, input b, output reg result, //unclocked input n, input start, input clk, clk_e, reset, output reg done //unclocked ); parameter ACC = 0; parameter MULT = 1; reg accum; reg mult; reg l_start; always @(posedge clk or posedge reset) if (reset) begin accum <= 0; l_start <= 0; end else begin l_start <= start; if (start) case (n): ACC : accum <= 0; MULT : mult <= a*b; //possible to add others endcase if (l_start) case (n): //ACC : nothing to do MULT : accum = accum + mult; endcase end //the result output always @* case (n) ACC : result = accum; //MULT : not interesting default : result = 0; endcase //control done behaviour always @* case (n) //ACC : handled with default case; MULT : done = l_start; //delay one cycle, because calculation still busy default : done = 1; endcase endmodule 

 

When you call custum instruction 0, the accumulator result is read out, and the accumulator is reset. 

When you call custom instruction 1, the multiple of a and b is added to the accumulator.  

 

The a and b inputs are seen as UNSIGNED by this code! 

 

You can make easy to use inline functions for the custom instruction, or use the IDE provided ones... 

 

 

inline unsigned long GetAndResetMAC() { unsigned long retval; __asm__ ("custom 0, %0, r0, r0" : "=r" (retval)); return retval; } inline void MAC(unsigned short a, unsigned short b) { __asm__ ("custom 1, r0, %1, %2", :: "r" (a), "r" (b)); }  

 

Notes : 

- I didn't test the code (and so give no warrant for the correctness), and I don't have the template for the signal names for a custom instructions at hand, something can be missing, you'll figure out what I mean. 

- with the n, you can make 256 custom instructions, so it can be extended. 

- the "start" signal is the kick-off for the instruction, the processor waits for a done high. That's why the done for the multiply instruction is delayed (l_start) (the pipleine stage). 

- there is some excellent documentation on this by Altera. 

 

I hope this helps 

 

Stefaan
0 Kudos
Altera_Forum
Honored Contributor II
310 Views

Hi, 

 

Now, as you would put this instruction in the Nios processor? 

 

If you have an example to send me, please post to the forum 

 

Thanks you!
0 Kudos
Reply