Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12597 Discussions

Fast data access with Nios II ?

Altera_Forum
Honored Contributor II
1,485 Views

I have tried to quickly access external data with a custom instruction. After I have realized, that 1 cycle-custom-instructions cannot have external connections, I implemented a 2-cycle custom instruction. However, when analyzing the design with SignalTap, it turned out that the 2cycle custom instruction in reality lasts 3 cycles (!). 

 

On the other side, the load-instruction needs minimum 4 cycles, as far as I obsorved, even with 0-wait-state memory (!!!). 

 

Will one of these behaviours be improved with Nios II 1.1? Is there another solution to access external data in max. 2 cycles? (Caching is not possible). 

 

Regards 

 

Thomas
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
529 Views

for two cycles remember that: 

 

Cycle 1 --> your hardware has to latch the data 

Cycle 2 --> your hardware has to process the data 

Cycle 3 --> the Nios has to read it back 

 

 

So when you think of two cycle you imagine steps 1 and 2, however you are creating the custom instruction for the Nios which has it's own latching to do. I was thrown off by this the first time I did a CC but after that I just got into a habit of figuring out the cycles for my hardware and adding one. 

 

In the Nios II ref. manual the latencies of the instructions are listed there. Being able to access memory across a bus in 2 cycles would result in a low fmax I'm sure.
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Hi BadOmen, 

 

my custom-instruction-logic itself needs one cycle (i.e. it has one pipeline-stage), so it should be possible to implement it as an instruction that execute in 2 cycles (in fact I could also realize it without an pipeline-delay, as the data is already there before the instruction starts, but Nios2 does not allow me to have an external interface for 1-cycle instructions. 

 

Regards 

 

Thomas
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Let me clear up a few things. 

 

There are two types of custom instructions: 

- combinatorial 

- multi-cycle 

 

The term register is ambiguous because you can't tell if I'm talking about one 

of the CPU registers in the register file or just a basic storage device. 

Let me use flop for the later. 

 

The combinatorial instructions can't write any flops and can't stall the CPU. 

We don't allow it to write any flops because the Nios II/s and Nios II/f might 

speculatively execute an instruction (mainly due to branch mispredictions) but 

then kill it later in the pipeline. Notice that I didn't say combinatorial instructions 

can't read from flops or can't read from external inputs. 

If you really have some data available for the CPU to read that is guaranteed to 

always be present (i.e. never needs to stall), I don't see why you can't use 

a combinatorial custom instruction to read it. Writing it is not allowed because 

of the speculative execution issue. 

 

The multi-cycle custom instructions execute later in the pipeline so are never 

speculatively executed. This allows them to read or write any flop or external 

values. They can also stall the pipeline as needed. These instructions always 

stall the pipeline for at least one cycle to avoid slow paths from the custom 

instruction into the pipeline (stall logic and register write data). 

As you've noticed, if you setup your custom instruction to take N cycles 

to execute, the pipeline always adds one more cycle to this. The pipeline 

registers the result data provided by your custom instruction before muxing 

it with the other sources of register write data (e.g. ALU, load data, multiply result, etc). 

You are allowed to have a multi-cycle custom instruction with N=1 

although I seem to remember a bogus error/warning from the custom instruction 

wizard in SOPC Builder if you try to do this. This should be fixed in Nios II 1.1. 

 

So, in summary, here's the best you can do: 

 

- Reading data from register/external input that is always ready: 

Implementation: Combinatorial custom instruction 

Performance: 1 cycle per instruction 

 

- Reading data from register/external input that might not always be ready: 

Implementation: variable-latency multi-cycle custom instruction 

Performance: Number of cycles of latency + 1 more 

 

- Writing data to register/external input: 

Implementation: fixed-latency multi-cycle custom instruction setup for one cycle 

Performance: 2 cycles per instruction 

 

Be careful if you have a multicycle custom instruction writing a flop/external output  

and a combinatorial custom instruction reading the same data. The combinatorial custom 

instruction won't read the latest value because it executes in an earlier pipeline stage 

than the multicycle custom instruction. If you have this situation, you also should 

use a multicycle custom instruction (with a fixed latency of 1 cycle) to read values. 

This will cut down your read performance to 2 cycles per instruction. 

 

I hope this helps!
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Hello James, 

 

thank you very much for your detailed explanation. The 1 cycle "multi-cycle"-instruction (that takes 2 cycles in reality) is the way to go. I simple ignore the warning and change the (default) value of 2 cycle-duration to 1 cycle. However, I still think that the documentation is misleading in this point. 

 

Your tip with the combinatorial instruction that just read is not allowed by my SOPC-builder (I am using Nios 2 1.01, Q II 4.1, I have not received my upgrade yet), it displays an error like "no external ports allowed for comb. instructions", even if I only declare external inputs. Maybe this is already improved in Nios2 1.1 ? 

 

However, this "real" single-cycle solution would not fit in my case, because I also generate a "next data please" signal, which of course should not be generated speculatively. I suppose there is no way the detect if my custom combinatorial instruction was killed after specul. execution? 

 

Regards 

 

Thomas
0 Kudos
Altera_Forum
Honored Contributor II
529 Views

Hummm, I don't see why combo instructions can't have external inputs. 

I'm going to see if we can change this for a future release.
0 Kudos
Reply