FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6669 Discussions

How to use altfp_sqrt with the Nios II?

Altera_Forum
Honored Contributor II
1,390 Views

Hi, 

for the reason of data processing I need the NiosII to calculate a floating point square root many times. The calculation without extra hardware block is to slow to meet the timing requirements, so want to use the altfp_sqrt megafunction. By the way I need single precision (32bit). I instantiated it with the following inputs and outputs: 

 

inputs: 

clock 

data[32] 

 

outputs: 

result[32] 

NaN 

Overflow 

Zero 

 

I connected the the clock input to a 50MHz clock comming out of the PLL. The input data[] is connected to a 32bit wide PIO output from the NiosII. The result[] is connected to a 32bit wide PIO input of the NiosII. The three status output signals are connected to an external logic analyzer for monitoring purpose. 

For testing I just write the same value to data input of altfp_sqrt again and again, while waiting until the result port outputs some value greater than 1. 

Here the code: 

# include <stdio.h># include <unistd.h># include "system.h"# include "altera_avalon_pio_regs.h" 

int main() 

while(1) 

{  

float data = 6025.0; 

float result = 0.0; 

 

do{ 

IOWR_ALTERA_AVALON_PIO_DATA(SQRT_BASE, data);  

result = IORD_ALTERA_AVALON_PIO_DATA(SQRT_BASE); 

}while (result < 1); 

printf("Wurzel 6025.0 = %ld\n",(long int)result); 

}  

 

Up to now the value of result is always zero. The zero signal which is monitored on the external logic analyzer stays at high all the time. Maybe someone has an idea what could be the reason. 

kloocki
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
646 Views

I think you should not use the PIO functions for this purpose. The approach should be as follows: 

1. Make your custom component (with register addressing) and add it in the SOPC system. Follow the steps as explained in NIOS manuals for this. 

2. Instead of using PIO functions, use simple assignment functions. You know the base address for this component. Instantiate a pointer in your NIOS program with this address. Use the pointer assignments for both writing a value at this address and reading it later when your component is done with the processing (You can inform NIOS of conclusion of the processing of data through interrupts etc.).
0 Kudos
Altera_Forum
Honored Contributor II
646 Views

Thank you for this information. 

I solved the problem already by instantiating the altfp_sqrt with Datain, Result and CLK_enable signals with the megawizard and afterwards implementing this vhdl-file as custom instruction with a fixed cycle length of 17 cycles. It works fine. The timing is really good. 

 

Thank you!!!
0 Kudos
Altera_Forum
Honored Contributor II
646 Views

(I'm glad I found this thread. It's exactly what I'm trying to do right now. Now a couple of years later ...) 

 

The custom instruction procedure kloocki describes worked for me too. But I have a question: 

 

The megawizard gives the option of picking 16 or 28 clocks to convert when single precision is selected. 

 

Does anyone know what the tradeoffs are when choosing between 16 and 28 clocks? That information isn't clear (maybe even not in there) in the FP Megafunctions user guide. 

 

Thanks, John Speth
0 Kudos
Altera_Forum
Honored Contributor II
646 Views

the higher pipeline should achieve a higher fmax but may use more logic/registers

0 Kudos
Altera_Forum
Honored Contributor II
646 Views

If you want the square root integrated into the software compilation so that calls to sqrtf() map directly into the hardware you can use something similar to this: http://www.nioswiki.com/custom_floating_point_unit 

 

The compile flag for that is "-mcustom-fsqrts=<x>" where "<x>" is the custom instruction number you map to the processor. This solution only makes sense if you are using square root all over the place in your code. If you are working on vector data you would be better off placing all the inputs for the square root into a buffer, DMAing them through a square root hardware accelerator, and writing the contents back to memory. 

0 Kudos
Reply