Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
1,409 Views

IORD_ALTERA_AVALON_PIO_DATA execution speed

Is there a faster way to read a pin than using the IORD_ALTERA_AVALON_PIO_DATA macro? 

 

I'm finding the code below is running quite slow, but by removing the second condition (i.e. reducing the number of IORD_ALTERA_AVALON_PIO_DATA commands) it runs significantly faster, almost an order of magnitude. 

 

Is there any overhead in this macro that I can get around by doing something more direct? Looking through the header files I couldn't see anything, but I thought it was worth asking. 

 

Thanks 

 

Bert 

 

if ((IORD_ALTERA_AVALON_PIO_DATA(PIO_CONTROL_IN_BASE) == 0x02) || (IORD_ALTERA_AVALON_PIO_DATA(PIO_CONTROL_IN_BASE) == 0x03)) { sample_array = IORD_ALTERA_AVALON_PIO_DATA(PIO_ADC_DATA_BASE); i++; }
0 Kudos
11 Replies
Altera_Forum
Honored Contributor I
185 Views

In this particular case you could write it asif ((IORD_ALTERA_AVALON_PIO_DATA(PIO_CONTROL_IN_BASE) & 0x02) != 0) { sample_array = IORD_ALTERA_AVALON_PIO_DATA(PIO_ADC_DATA_BASE); i++; } 

In the more generaL case, avoid reading to the same HW-address multiple times and use a temporary variableuint32 temp = IORD_ALTERA_AVALON_PIO_DATA(PIO_CONTROL_IN_BASE) ; if ((temp == 0x02) || (temp == 0x03)) { sample_array = IORD_ALTERA_AVALON_PIO_DATA(PIO_ADC_DATA_BASE); i++; }
Altera_Forum
Honored Contributor I
185 Views

Sorry, my original question wasn't clear enough. Even with the improvement I listed and joysb has elaborated on, it's still too slow, and because the speed seems to be massively impacted by the IORD_ALTERA_AVALON_PIO_DATA macros, I wondered if there was some alternative with less overhead? 

 

Thanks 

 

Bert
Altera_Forum
Honored Contributor I
185 Views

The IORD_ALTERA_AVALON_PIO_DATA is translated into a single CPU instruction IIRC, so you will hardly find any faster method of reading a PIO. 

I see several reasons that could explain what you are seeing. Either the PIO itself is slow (and if it is in a different clock domain, it will stall the CPU for several extra cycles), or there is an unknown bias in your test. It could be a bug in the way you are measuring time, or your test without the macro could be artificially faster because it was somewhat optimized by the compiler. 

(about the bug in time measurement system, it happened to me not so long ago. I was running a network benchmark, and couldn't understand inconsistent results in bandwidth measurements, until I figured out I had a 32-bit counter that rolled over). 

 

Please note that a softcore CPU isn't fast anyway, if you need to read data at a fast rate, you should consider using a DMA instead.
Altera_Forum
Honored Contributor I
185 Views

Make sure the avalon slave is a 32bit one - otherwise you get a bus width adapter that will slow the transfer down a lot - especially if there is also a clock crossing bridge. 

 

The two conditionals are also likely to lead to some mis-predicted branches - so you'll be seeing a pipeline stall. 

A 'taken branch' is also slower than the 'not taken branch' even when predicted properly. 

 

It might even be that you are waiting for an extra code cache line read. 

 

Also make sure you are compiling everything with -O2 (or -O3). 

 

With extreme care, it is possible to get the code to run (from tightly coupled instruction memory) without any pipeline stalls in the important paths.
Altera_Forum
Honored Contributor I
185 Views

 

--- Quote Start ---  

The IORD_ALTERA_AVALON_PIO_DATA is translated into a single CPU instruction IIRC, so you will hardly find any faster method of reading a PIO. 

I see several reasons that could explain what you are seeing. Either the PIO itself is slow (and if it is in a different clock domain, it will stall the CPU for several extra cycles), or there is an unknown bias in your test. It could be a bug in the way you are measuring time, or your test without the macro could be artificially faster because it was somewhat optimized by the compiler. 

(about the bug in time measurement system, it happened to me not so long ago. I was running a network benchmark, and couldn't understand inconsistent results in bandwidth measurements, until I figured out I had a 32-bit counter that rolled over). 

 

Please note that a softcore CPU isn't fast anyway, if you need to read data at a fast rate, you should consider using a DMA instead. 

--- Quote End ---  

 

Hello DaiXiWen,recently I encountered a problem same with what the thread said. I need to read 11 32-bit PIOs in 20us each PIO-interrupt. I don't know whether the time is enough. I wish you could give me some advise. 

Ps: my nios processor clock is 120MHz and the same to sdram.
Altera_Forum
Honored Contributor I
185 Views

The interrupt entry/exit code paths and any SDRAM accesses could easily dominate that workload. 

The PIO reads themselves are likely to take 3 clocks, with a 2 clock stall if the result is needed by the next instruction. 

 

A 120MHz cpu gives you 20*120 = 2400 clocks between each of your PIO interrupts. 

So you should easily be able to write interrupt entry/exit code and do the required transfers within that period. 

The problem is likely to be code that disables your interrupt - especially other interrupts (unless you have a multi-level interrupt controller with the priorites set 'correctly'). In particular I suspect that the JTAG debug and UART will have longer ISRs.
Altera_Forum
Honored Contributor I
185 Views

 

--- Quote Start ---  

Make sure the avalon slave is a 32bit one - otherwise you get a bus width adapter that will slow the transfer down a lot - especially if there is also a clock crossing bridge. 

--- Quote End ---  

 

 

A bit late to reply now, but this is what fixed my problem. Thanks very much dsl for your help. 

 

Bert
Altera_Forum
Honored Contributor I
185 Views

 

--- Quote Start ---  

The interrupt entry/exit code paths and any SDRAM accesses could easily dominate that workload. 

The PIO reads themselves are likely to take 3 clocks, with a 2 clock stall if the result is needed by the next instruction. 

 

A 120MHz cpu gives you 20*120 = 2400 clocks between each of your PIO interrupts. 

So you should easily be able to write interrupt entry/exit code and do the required transfers within that period. 

The problem is likely to be code that disables your interrupt - especially other interrupts (unless you have a multi-level interrupt controller with the priorites set 'correctly'). In particular I suspect that the JTAG debug and UART will have longer ISRs. 

--- Quote End ---  

 

 

Maybe i could explain my problem more detailedly. I did a AD-Collection and ethernet transfer(with DP83848 and TSE MAC) module. I test the module with 16-bit counter. I read the AD data by PIOs every time the external synchronizing signal reaches a falling edge. Now everything seems OK but some data loses but not packet loses.I suspect the time for the interrupt to response isn't long enough, although the response time is 20us.  

Ps: I've already set the synchronizing signal interrupt priority to No.1.
Altera_Forum
Honored Contributor I
185 Views

My guess is that the packet sending part takes too much CPU time and you miss some data when sending the packet.

Altera_Forum
Honored Contributor I
185 Views

Is there any solutions except big-volume FIFO or dual-port ram ? I've set the instruction cache and data cache to its' extreme. Or may i connect the Verilog-logic FIFO to the NIOS core by PIO ? Cuz..i am not so familiar with FIFO on Avalon-MM..

Altera_Forum
Honored Contributor I
185 Views

Actually for something like this I would rather use a DMA to periodically read the AD data from the converter and write it directly to memory. You could then raise an interrupt once a buffer is full and let the CPU handle the Ethernet encapsulation.

Reply