Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12606 Discussions

How to call tightly couple module during C code for NIOS II

Altera_Forum
Honored Contributor II
1,305 Views

Hi there... I am a new to NIOS II technology and I would like to ask... how can we call for module in C programming for NIOS II. Because from my understanding, we can add some module that coupled with the NIOS II so that some of the function can be taking care by hardware (correct me if wrong:)) . However, I do not really understand how we can do it? help please............ thanks.......

0 Kudos
17 Replies
Altera_Forum
Honored Contributor II
529 Views

I think you want the C2H feature where C is compiled to a hardware implementation. Search C2H on the forums and Altera site. 

 

Bill
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Hi Bill.. Thanks for reply... Erm.. I am not using C2H compiler... What I mean is after my SOPC system is build where custom peripheral had been added, when i write my program in the Eclipse using C code, how should i call my peripheral? Where can I find those names or command?

0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Any information passed to the software environment accessible from C that are generated from SOPC are stored in system.h. Most C files will# include system.h to get the definitions for all of the components in the SOPC design. What is in system.h for your peripheral depends completely on the design of the peripheral. At a minimum you will have the base address. I would think you need to write a HAL (API) layer to read/write the peripheral to get it to do something and maybe to return a result. This is probably done with IORD and IOWR C macros to make your peripheral functional. 

 

Bill
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

I am confused.... In my thought, i expect the HAL are auto generated... Isn't that so? If i going to write myself, is there any resources that i can refer to?

0 Kudos
Altera_Forum
Honored Contributor II
529 Views

You said "custom peripheral". How can the HAL shipped with NIOS II know what the peripheral is or does? There is a HAL for all devices in SOPC that come from Altera. Anything else in a SOPC design is custom - both hardware and the HAL are user-provided. 

 

Bill
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

I see.... Thank you very much for your help. I will look into customs HAL for more detail. Is there any resources that you can recommend me to?

0 Kudos
Altera_Forum
Honored Contributor II
529 Views

My thinking is most peripherals contain one or more registers (addresses) that you write to change the hardware and read to get the result. We have an internet checksum hardware module in SOPC. We write the base address to checksum to a register and the size in bytes to another register. That second write kicks off the hardware loop to read and checksum. A register is read until the hardware is done - the hardware sets a "done" bit. Then we read a second register which contains the internet checksum. 

 

Our HAL for this was imply an H file with the function: 

 

u16 CalcInternetChecksum(u8 *buffer, u32 length); 

 

And we use a C function that writes and reads the registers per the above description. We simply used IOWR and IORD to the# define of the BASE address in System.h (which is the peripheral name). System.h has something like: 

 

# define INTERNET_CHECKSUM_0_BASE 0x04001000And our H file also has# defines which are custom register offsets from the base address of the peripheral. 

 

# define CHECKSUM_ADDR 0 // start addr# define CHECKSUM_LEN 1 // length# define CHECKSUM_DONE 2 // non-0 when done, 0 otherwise# define CHECKSUM_RESULT 3 // checksum resultOur hardware designer defined in his Verilog code these offsets (which are 32-bit registers). 

 

For example, this is an example checksum function: 

 

u16 CalcInternetChecksum(u8 *buffer, u32 length) { IOWR(INTERNET_CHECKSUM_0_BASE, CHECKSUM_ADDR, (u32) buffer); IOWR(INTERNET_CHECKSUM_0_BASE, CHECKSUM_LEN, length); while(IORD(INTERNET_CHECKSUM_0_BASE, CHECKSUM_DONE)==0) ; // Loop til done return IORD(INTERNET_CHECKSUM_0_BASE, CHECKSUM_RESULT); }If the base address changes in SOPC, this code will adjust to that. If the designer were to change the register offsets in the Verilog, then *I* would have to update the H file that contains those# defines. 

 

The only thing SOPC did for me was to export the base address into System.h. That's all I need from it to write the rest of this simple HAL. 

 

I hope this example is helpful! 

Bill
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

There are multiple ways to do this depending upon your HDL module. You can include your own module as a custom instruction in the SOPC Builder, as you've suggested. A simpler method would be to interface with the PIO ports. This would require instantiating your module in the high-level HDL file in Quartus, then connect your Nios2 PIO ports to your module. 

 

For example, if you have a Verilog module that outputs a single number as a result of two inputs, simply generate 3 PIOs in your SOPC builder and connect. Your Nios program would write a number to each output PIO, then read from the input PIO after a minimum timing requirement. 

 

On the other hand, if your module requires a large amount of data stored in SDRAM (i.e. computing an md5 hash), then using the PIOs might not be such a good idea since they won't have access to the Avalon-MM bus. 

 

A 3rd method would be to create your own Avalon-MM component, which would require the most amount of programming. 

 

For what it's worth, I haven't had a good experience using Custom Instructions on anything a PIO interface couldn't easily handle. 

 

-J
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

 

--- Quote Start ---  

There are multiple ways to do this depending upon your HDL module. You can include your own module as a custom instruction in the SOPC Builder, as you've suggested. A simpler method would be to interface with the PIO ports. This would require instantiating your module in the high-level HDL file in Quartus, then connect your Nios2 PIO ports to your module. 

 

For example, if you have a Verilog module that outputs a single number as a result of two inputs, simply generate 3 PIOs in your SOPC builder and connect. Your Nios program would write a number to each output PIO, then read from the input PIO after a minimum timing requirement. 

 

-J 

--- Quote End ---  

 

 

Hi gaudetteje, I am trying your method of using 3 PIOs in SOPC Builder to do a simple calculation in hardware from numbers coming from NIOS. I initially tried to do a sum of products on two arrays of integer but I could not get anywere and so I decided to start with a basic addition of two numbers. See my original forum post: 

http://www.alteraforum.com/forum/showthread.php?t=29030 

 

So now my verilog module is: 

module add_two ( // Inputs line_1_in, line_2_in, // Output result_out ); //Port Declarations // Inputs input line_1_in; // 8 bit value input line_2_in; // Output output result_out; // assume 8 bit assign result_out = line_1_in + line_2_in ; endmodule  

 

I've declared the input/output ports as in my other post, and added these lines in the C code: 

 

int line1 = 3; int line2 = 4; int store_val; volatile int * line_1_ptr = (int *) 0x08200000; // Port_in_1 address volatile int * line_2_ptr = (int *) 0x08200010;// Port_in_2 address volatile int * result_back_ptr = (int *) 0x08200020;// Port_out_result *line_1_ptr = line1; *line_2_ptr = line2; store_val = * result_back_ptr;  

 

Questions: 

 

- How do I instantiate that add_two.v module in the top level .v file?  

- How I do make the add_two.v module accept the data from the NIOS processor through the PIOs and return a value again? 

- Is the use of pointers correct? I saw in another posts people suggesting to use IORD/IORW but I am not too sure how to do that.  

 

Any help is appreciated.
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

I'll need to look at my code to give you a more detailed answer. Here's a quick response that may address some of your questions. 

 

1- I'll look at my code tonight for an example to post. 

 

2- Assuming your add_two.v module runs asynchronously (it is if you're not using a clock), then you need to determine the worst case latency in the calculation. If your clock period is 20ns, but your computation time is 45ns, then you'll need a NOP or pause instruction for 2 additional cycles before you can read the result. 

 

The easiest way to compute an entire array would be one byte (or word) at a time using the PIO. The Nios CPU would iterate over each index in the arrays. Write a number to PIO1, write a number to PIO2, then read from PIO3 when the computation is done. As I've said, this may take more than one clock cycle. Also, you would not be taking advantage of parallelism this way, so a custom instruction or Avalon slave component might be necessary for your speed requirements. It's always best to start simple and work your way up. 

 

3- Look in your system.h file or possibly another included header. You'll see a# define for IORD and IORW. Altera just uses these# defs for a shortcut, but if I remember correctly they are merely pointers to your PIO registers hardware address. 

 

One other comment - you should probably take advantage of the data types defined in alt_types.h. If your C code is using 32-bit ints, you're writing a 32-bit number to an 8-bit register. This could be causing memory overflow issues if you have another register at the next memory location. I would need to refresh my memory, but isn't the Avalon-MM a 16-bit addressable bus? 

 

-J
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Here's an example, as promised. 

 

In my top level .v file, I have the following: 

// Instantiate Nios II/e CPU cpu_accel CPU0 ( .clk_0(CLOCK_CPU), .in_port_to_the_data_in0(int_datain), .out_port_from_the_data_out0(int_dataout0), .out_port_from_the_data_out1(int_dataout1), .reset_n(KEY), ); // Instantiate custom accelerated verilog component FIR FIR0 ( .PIO_in0(int_dataout0), .PIO_in1(int_dataout1), .PIO_out0(int_datain0) );"cpu_accel.v" is my generated SOPC builder system. "FIR" is my custom Verilog module. 'int_datain' and 'int_dataout' are simply defined as wires. 

 

You can use IO_RD and IO_WR. In this case, I simply used pointers: 

<snip> int main(int argc, char* argv) { // PIO pointers int* data_out0 = (int *) DATA_OUT0_BASE; int* data_out1 = (int *) DATA_OUT1_BASE; volatile int* data_in0 = (int *) DATA_IN0_BASE; int res; <snip> *(data_out1) = 1; *(data_out0) = 2; // insert wait statement here, if necessary res = *data_in0; // copy result to local int <snip>
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Thank you gaudetteje for the sample code. Looking at your code, I realized I made a mistake with the direction of the inpt/ouput ports. But I am still facing problems compiling. So let's start fault finding. 

 

- I have 3 PIOs in SOPC Builder. 2 out (data_out_0 and data_out_1) and 1 in (result_in0) 

- Then in my main.v file port declaration, I put 

input result_data; output int_data_line0; output int_data_line1;  

- In the SOPC generated NIOS system, I put 

nios_system NiosII ( // my pios .in_port_to_the_result_in0 (result_data), .out_port_from_the_data_out0 (int_data_line0), .out_port_from_the_data_out1 (int_data_line1), );  

-Finally I instantiate my custom verilog module 

add_two two_vals( .clk(system_clock), .line_1_in (int_data_line0), .line_2_in (int_data_line1), .result_back_out(result_data), );  

 

When I compile this, I get Error : Net "result_data",which fans out to "nios_system:NiosII|in_port_to_the_result_in0[0]", cannot be assigned to more than one value. 

 

Could this error be due to a badly written add_two.v module? I still have the module as before except I add tried to add a clock to it : 

module add_two ( // Inputs clk, line_1_in, line_2_in, // Output result_back_out ); //Port Declarations // Inputs input clk; input line_1_in; // 8 bit value input line_2_in; // Output output wire result_back_out; // assume 8 bit reg original_line_1; reg original_line_2; reg temp_sum; always@ (posedge clk) begin original_line_1 <= line_1_in; original_line_2 <= line_2_in; temp_sum <= original_line_1 + original_line_2; end assign result_back_out = temp_sum ; endmodule  

 

I also saw a warning message in SOPC builder for that PIO In when I generated the system which said: 'PIO Inputs are not hardwired in test bench. Undefined values will be read from PIO inputs during simulation.'. Am I doing something wrong there too?
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Think of the verilog you're writing as a bunch of wires. You don't need to give your main.v module access to the PIO port wires. There should only be 1 driver on a line or the result is undefined. If you want to toggle your LEDs or use buttons on an eval board, you can create an assign statement or you could control the LEDs directly through a separate PIO in the Nios. What you're trying to do is drive 'result_data' from the Nios AND from an I/O pin on your board somewhere. Similarly, 'int_dataout_0/1' is a wire and can't be defined as both a wire and IO port. In your port declaration of main.v, you need to come up with a different name (i.e. output [7:0] LED_RED; but double check my syntax). Use 'assign LED_RED <= int_dataout_0' if that's what you're trying to do. 

 

For now, remove the port declarations and just program the Nios to printf the result to your debug console. That's the easiest way to see if the numbers are correct. 

 

Also, for a simple addition you don't need synchronous logic - get rid of the clock unless you want it pipelined. In that case, you should really be using a different clock phase (with a PLL) than your SOPC system to ensure data is valid. 

 

You can ignore the testbench warnings until you start simulating your Nios processor with Modelsim or another HDL simulator.
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Thank you so much gaudetteje! I don't know how to thank you enough.. if only all Altera tutorials were like the way you explained things.  

 

So yes, I've corrected my mistake concerning PIOs and declared internal wires instead, and now I can move two data values into the verilog module and get the output back.. finally! 

 

The next stage for me is to do this calculation on two arrays of values. As you mentioned before, easy way is to have the iteration in NIOS and send the data one at a time. This should not take too long. 

 

But as for using custom instruction or Avalon slave component to take advantage of parallelism, I will certainly come back to you for expert advice :) Any guidance is greatly appreciated.
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

OK, so now I have 3 arrays of data, each containing 100 integers defined in NIOS C code as: 

 

alt_u8 line1 = {1, 2, 3, ..., 100}; alt_u8 line2 = {10, 10, 10, ..., 10}; alt_u8 line3 = {1, 2, 3, ..., 100};  

 

The next step is I want to apply a calculation to these arrays. Imagine these arrays are 3 rows of pixel values and I want to do a Sobel operation on them. I found the following code on the web to do this job: 

 

module sobel_mine( p0, p1, p2, p3, p5, p6, p7, p8, out); input p0,p1,p2,p3,p5,p6,p7,p8; // 8 bit pixels inputs output out; // 8 bit output pixel wire signed gx,gy; wire signed abs_gx,abs_gy; wire sum; assign gx=((p2-p0)+((p5-p3)<<1)+(p8-p6));//sobel mask for gradient in horiz. direction assign gy=((p0-p6)+((p1-p7)<<1)+(p2-p8));//sobel mask for gradient in vertical direction assign abs_gx = (gx? ~gx+1 : gx); // to find the absolute value of gx. assign abs_gy = (gy? ~gy+1 : gy); // to find the absolute value of gy. assign sum = (abs_gx+abs_gy); // finding the sum assign out = (|sum)?8'hff : sum; // to limit the max value to 255 endmodule  

So to interface this verilog code with NIOSII, I create 8 ouput PIOs of size 8 bits, and declare them in the C code as: 

 

volatile int* data_out_0_ptr = (int *) 0x08208010; // Data_out_0 address volatile int* data_out_1_ptr = (int *) 0x08208020;// Data_out_1 address . . . volatile int* data_out_8_ptr = (int *) 0x08208080;// Data_out_8 address  

 

Then I do a for-loop to access each element and send it to the verilog module for the calculation 

 

for( i=0; i<98; i++) { *(data_out_0_ptr)= line1; *(data_out_1_ptr)= line1; *(data_out_2_ptr)= line1; *(data_out_3_ptr)= line2; *(data_out_5_ptr)= line2; *(data_out_6_ptr)= line3; *(data_out_7_ptr)= line3; *(data_out_8_ptr)= line3; sum_val = *result_back_ptr ; //printf("sum_val = %d\n", sum_val); }  

 

This code as it is works fine but I am sure that I am not taking FPGA's advantage. My new queries are: 

 

1) Instead of having 8 ouput PIOs of 8 bits each, can I have 2 ouput PIO of 32 bits? If yes, how do I modify the C code to reference the right address? For example, suppose I have my 32-bit PIO data_out_32bit_ptr at address 0x08208090. In the C code for-loop, I am not sure how to reference the correct data. Is something like below correct? 

 

*(data_out_32bit_ptr)= line1; *(data_out_32bit_ptr + 0x8)= line1; *(data_out_32bit_ptr + 0x10)= line1; *(data_out_32bit_ptr + 0x18)= line2;  

But I see the End address as being 0x0820809f in SOPC Builder when I include such a 32-bit ouput PIO. 

 

2) I suppose I can make the most of FPGA parallelism by getting rid of this for-loop. But how do I send the data then? 

 

3) Also, the results contain 98 values which I can print on the console window. But I need to store the values. I tried the altera_hostfs and I managed to send the data to a text file on my computer (after 3 days of fighting ;)!) But this works only when I choose Debug As -> NIOS II Hardware and it seems to run slower than when I choose Run As -> NIOS II Hardware. What is the best way to get those values?
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

 

--- Quote Start ---  

1) Instead of having 8 ouput PIOs of 8 bits each, can I have 2 ouput PIO of 32 bits? If yes, how do I modify the C code to reference the right address? For example, suppose I have my 32-bit PIO data_out_32bit_ptr at address 0x08208090. In the C code for-loop, I am not sure how to reference the correct data. Is something like below correct? 

 

*(data_out_32bit_ptr)= line1; *(data_out_32bit_ptr + 0x8)= line1; *(data_out_32bit_ptr + 0x10)= line1; *(data_out_32bit_ptr + 0x18)= line2; But I see the End address as being 0x0820809f in SOPC Builder when I include such a 32-bit ouput PIO. 

--- Quote End ---  

Yes, you can have 2 32-bit PIOs. I can't answer your C coding question, but it sounds like you'll need some type casting. If your data is stored as sequential bytes, then accessing line1[i] as a 32-bit element should return a 32-bit number. You can check this with a printf("%x"). 

 

 

--- Quote Start ---  

2) I suppose I can make the most of FPGA parallelism by getting rid of this for-loop. But how do I send the data then? 

--- Quote End ---  

Before getting into this, you should ask the question "do you NEED to?" If you're system is operating in real-time with enough headroom for anything else required then your job is done. Honestly, though, if using PIOs on a 50-300MHz processor is sufficient, then you don't need an FPGA. It could probably be done on a PC-104 stack or other single-board uC. You could also take advantage of floating point ops on a Power PC or PDSP without much difficulty. 

 

If the answer is yes, then you have options. Refer to my original response. Since you have 2 32-bit inputs and 1 8-bit output to your module, this would be a good candidate of a custom instruction and you'd save 2 of the 3 cycles required for PIOs. But to utilize the FPGA resources and gain serious speedup, you would add a wrapper module that replicates the Sobel module. The Nios in this case would probably be doing some DMA transfer or providing the Avalon-MM address to a memory location (if you give your module an Avalon-MM master & slave port and connect it in SOPC builder). The wrapper module would retrieve a large block of pixels and perform the Sobel operation N times. 

 

For a simple example, the wrapper module gathers a 4x4 pixel grid. With this data, you could instantiate 4 Sobel submodules and return 4 resulting pixels. The wrapper simply maps the correct pixels to the corresponding submodule(s). It would also be responsible for handshaking with other Avalon components and the Avalon-MM clock interface. Note that your submodules need not be clocked, only that you latch the data when it's guaranteed to be available. There are several examples of creating Avalon-MM components on Altera's website. 

 

 

 

--- Quote Start ---  

3) Also, the results contain 98 values which I can print on the console window. But I need to store the values. I tried the altera_hostfs and I managed to send the data to a text file on my computer (after 3 days of fighting ;)!) But this works only when I choose Debug As -> NIOS II Hardware and it seems to run slower than when I choose Run As -> NIOS II Hardware. What is the best way to get those values? 

--- Quote End ---  

 

 

I haven't used this method, so I can't comment on why Run is slower than Debug. Debug allows you to step through the Nios instructions with breakpoints. If Run mode isn't working then there's probably a timing issue. 

 

Storing data is a problem in and of itself. Why not just copy/paste from the console after your image is complete? Does it need to operate untethered from your PC? 

 

I frequently copy from the console and paste into MATLAB/Octave to verify the results. For something more automated, or if you're using this system iteratively, you'll need to store results in non-volatile memory like onboard flash or an SD card. Sending over USB to a harddrive works too, but is more complicated. What components are available on your eval board? Better yet, what eval board are you using?
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Thanks for the time invested in explaining all this to me. I understand overall what you have explained but this has also created new queries because I could not succeed in implementing your suggestions. So to start, let's break it down again: 

 

 

--- Quote Start ---  

Yes, you can have 2 32-bit PIOs. 

--- Quote End ---  

 

 

OK, I managed to do that based on another forum post . 

 

 

--- Quote Start ---  

Before getting into this, you should ask the question "do you NEED to?"  

--- Quote End ---  

 

Yes, I need to make it work on FPGA using its parallelism resources. I understand your reasons for not using it this way, but my task is more educational than real-world. I am just stepping into FPGA world and this is my way to learn. 

 

 

--- Quote Start ---  

But to utilize the FPGA resources and gain serious speedup, you would add a wrapper module that replicates the Sobel module. The Nios in this case would probably be doing some DMA transfer or providing the Avalon-MM address to a memory location 

--- Quote End ---  

 

 

My other attempt was to discard the PIOs and instead create an SOPC component with Avalon-MM. But I don't know how to create a wrapper module to instantiate more Sobel submodules. I've downloaded the Avalon Memory-Mapped Slave Template from Altera website, inserted as a component in SOPC and renamed as my_slave_component. However, I don't know what to put as the Register File properties (Word Size and Synchronization) and capabilities (I have enabled only two registers - one input and one output. Is that good?). 

 

When I looked at the my_slave_component.v to modify it and add my custom logic to it, I saw many new input/output ports.  

 

input wire clk, // clock_reset.clk input wire reset, // clock_reset_reset.reset input wire slave_address, // s0.address input wire slave_read, // .read input wire slave_write, // .write output wire slave_readdata, // .readdata input wire slave_writedata, // .writedata input wire slave_byteenable, // .byteenable output wire user_dataout_0, // user_interface.export input wire user_datain_1  

 

Do I need to assign values to these ports or should I worry about my user_dataout_0 and user_datain_1 only? And where exactly within the my_slave_component.v code do I paste my code? There is a section called slave_template within the code. Do I put it before or after slave_template, or does it not matter? 

 

Do I also need a DMA Controller component? If yes, here again I have problems with the parameters :( 

 

 

--- Quote Start ---  

Storing data is a problem in and of itself. Why not just copy/paste from the console after your image is complete? Does it need to operate untethered from your PC? 

--- Quote End ---  

 

 

No, it can be tethered to the PC for now. I will sound even more stupid now but I can't copy-paste the results! I can highlight it all, but copy or Ctrl-c would not work. Do I have to enable something in Eclipse before? I am using Quartus 10.1 with NIOS SBT. 

 

 

--- Quote Start ---  

 

Better yet, what eval board are you using? 

--- Quote End ---  

 

 

I am using DE2-115 which has SD-card interface. I will try to read and store data to this when I get the current problems out of the way and after I complete the applying of multiple Sobels to a small 2-D array. 

 

Thank you for any advice about my new queries. It's quite tough to learn this technology on my own as I am doing.
0 Kudos
Reply