FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6359 Discussions

Alternatives to NIOS

Altera_Forum
Honored Contributor II
5,173 Views

I'm getting a bit frustrated with NIOS and I'd like to consider alternative soft-cores. NIOS seems to be very powerful for OS implementations, but not too efficient for small microcontroller-like applications. 

 

At this point, I'm looking for a free, ideally open-source soft-core. Anybody tried other soft-cores? 

 

What about Mico32/Mico8, and MicroBlaze/PicoBlaze open-source clones like PacoBlaze? Is it legal to use them in Altera devices? 

 

Thanks,
0 Kudos
22 Replies
Altera_Forum
Honored Contributor II
1,741 Views

I'm surprised. 

 

NIOS II should work quitte well with out any OS involved. 

 

MicroBlaze should not work will in Altera arch. as it was designed and hand optimized for the Xilinx arch. 

PicoBlaze may work well for you, but it is not really a processor, but more of a control store state machine (OK - a small processor) 

 

Tell us all of your experiences so we all can get a little smarter.
0 Kudos
Altera_Forum
Honored Contributor II
1,741 Views

Someone did already some stuff with Mico32 on altera.  

 

I had this link somewhere, but it seems to be down at the moment :  

https://roulette.das-labor.org/bzrtrac/wiki/soc-lm32 (https://roulette.das-labor.org/bzrtrac/wiki/soc-lm32

 

Anyway, I do applications with my own hal and kernel, this released me a lot from the task of trying to keep up with the ever changing Altera stuff. Remember the days that Altera decided to not support the so called legacy design flow anymore... :mad:  

 

 

Stefaan
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

 

--- Quote Start ---  

I'm surprised. 

NIOS II should work quitte well with out any OS involved. 

--- Quote End ---  

 

 

I didn't mean NIOS doesn't work well without an OS. I meant that NIOS seems to be designed for being efficient in big projects (where you would usually use an OS).  

 

NIOS seems to be designed mainly for the /s and /f cores. I want a smaller processor. I want something with an area size similar to the /e core, but not at the expense of 6 clocks or more per instruction. This is almost ridiculous for RISC. It is, of course, just a consequence of NIOS being designed around a deep pipeline. Again, great for the bigger cores, but too bad for the /e core. A processor designed from the ground for a shorter pipeline would be much more efficient on small cores (but of course, not as much efficient as NIOS for bigger cores).
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

I think you have some misperception of a pipline. It's unavaidable to have a pipline, especially with FPGA technology. Without the pipline (wich is not so deep as you might think, 5 is about standard, the nios fast processor version has one more).  

 

The deeper the pipline, the bigger the branch penalty...  

 

The microsblaze has only 3 I think (not sure). Mico32 from lattice also 5 if I well remember. 

 

Hower, you have the option for NIOSII to work without a pipline (the small versio). Then the speed is rather slow (~6 cycles per instruction). 

 

All depends on what you really need, and what you define as an "free, ideally open-source soft-core". Like in my case, I started already many times to develop a 32 bit processor, but anytime I stop because I don't have the knowledge (and time) to do a port of gcc. The toolchain is most of the times the bottleneck, without a decent compiler you can't do anything. 

 

Stefaan
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

there are many soft-cores available at opencores.org like AVR core or OpenRisc 

 

altera ported openrisc 1200 to StratixIII device and it requires around 7000 LE, way too much compared to Nios' > 1000 LEs. so even for small control application with program less than 2KB, Nios seems to be tiny enough!
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

 

--- Quote Start ---  

altera ported openrisc 1200 to StratixIII device and it requires around 7000 LE 

--- Quote End ---  

 

 

OpenRisc is not an attractive alternative to Nios. At least not to me. 

 

I made some research and I couldn't find exactly what I wanted. FPGA soft-cores of standard off the shelf micros are usually not too efficient. Currently Lattice cores seems to be the best alternative.
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

I've never used Nios II, but from the technical specs it looks quite impressive. I think you'd be hard pressed to find a C programmable core that is significantly better. 

 

That said, the license can be an issue which is why I have my own (GPL'ed) solution: A MIPS compatible 32-bit core (http://thorn.ws/yari). I'd love to see a performance comparison between it and Nios II, but I haven't yet taken the time to do it myself. It certainly won't satisfy the OP as it weighs in at ~ 7k LC, but performance was priority for YARI, not resource consumption. 

 

Regards 

Tommy
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

One major problem for using alternate softcores, is that there is no smooth way to do JTAG debugging. 

 

If you are restricted to documented interfaces, the only way to do JTAG debugging, is using TCL scripting. It might be feasible to run a debugger that would create TCL scripts on the fly, invoke a quartus executable, and parse the results. But this is hardly usable and would be extremely slow. 

 

You can implement a JTAG debugger using some already reverse-engineered interfaces. But you still wouldn't be able to run the debugger concurrently with other Altera Jtag tools, such as Signal Tap. 

 

This is too bad. I wonder if Xilinx also has so many undisclosed JTAG aspects.
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

Wow what a strange complaint. I have designed processing cores and you can't get the performance without the pipeline. So you can either sacrifice the performance and use the /e core or you can accept the resource usage of the higher performance cores. 

 

None of my designs to date have used an OS. I tend to burden the processor with a lot of control and communication logic that would be extremely time consuming and even impossible to implement in firmware. The processor has become a critical component in my designs. As such I have no problem sacrificing a small percentage of the chip to it. I get way more functionality per gate from the processor than any other piece of firmware. My firmware exists to perform time-critical functions that are impossible to do in a processor.  

 

Even the /s and /f cores are pretty small. That fact that you don't find these sizes acceptable would indicate that you are not going to use the processor for much (not getting your value). If you're not going to use the processor for much then why not just implement the needed functions in firmware. Or add a hard micro-controller to your board and interface it with the FPGA. The price/gate for the NIOS is very reasonable. Throw in all of the ease-of-use issues and the NIOS is a no-brainer. 

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

 

--- Quote Start ---  

Wow what a strange complaint. I have designed processing cores and you can't get the performance without the pipeline. 

--- Quote End ---  

 

 

I didn't complain about the pipeline. All I said about the pipeline, is that its depth has a devastating effect on the /e core. The /e core seems to implement all the pipeline stages, but in a sequential, non-pipelined way. 

 

That was a sample to make the point that NIOS is designed to be efficient for the /f and /s cores, not for the /e core. I wasn't complaining. I realize that is the best solution for most cases. 

 

I just was hoping I could find something else designed from the ground for a smaller size. But seems there is no such a thing. Everybody targets high-performance cores such as Nios, or tiny 8-bit softcores such as PicoBlaze or Mico8. Nothing in the middle.
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

 

--- Quote Start ---  

I'm getting a bit frustrated with NIOS and I'd like to consider alternative soft-cores. 

--- Quote End ---  

 

 

 

--- Quote Start ---  

I want something with an area size similar to the /e core, but not at the expense of 6 clocks or more per instruction. This is almost ridiculous for RISC. It is, of course, just a consequence of NIOS being designed around a deep pipeline. 

--- Quote End ---  

 

 

Sounds like a complaint to me. 

 

You are unknowingly complaining about the pipeline. 

 

In order to execute an instruction, you have to  

1 - fetch the instruction from memory, 

2 - decode the instruction to determine what registers / memory need to be accessed and what operations will be performed on those operands. 

3 - Execute the instruction (add, multiply, etc.) 

4 - Access the memory (in the case of instructions that load or store from memory) 

5 - Write execution or memory stage results back into registers. 

 

So, you can either do all of theses steps sequentially for every instruction or you can pipeline them. By complaining about the execution time of the /e core you are saying that you don't like the fact that all of the above steps have to be performed sequentially. However, you are also saying that you don't like the logic usage of the /s or /f cores which are pipelined and use a cache. A pipeline is required for efficiency (talking about MIPS). 

 

Now in reality, the above steps can be reduced to a 3-stage pipeline by combining stages. This results in an fmax penalty as more combinatorial logic is required in each of the stages. 

 

Your best bet may be the Microblaze which gives you the option of using a 3-stage pipeline. I don't know what the logic usage would be and of course that would be a different forum. And I'm sure the Microblaze forum would have no problem with complaints about the NIOS.
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

I read that somebody suspects the 6 cycles per instruction on the /e core is not really related to the pipeline, but because the /e core uses 16 bit logic internally to achieve very small area size. No idea if that is true, but it would make more sense.

0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

This soft CPU might fit your needs better: 

 

http://www.zylin.com/zpu.htm 

 

298 LUT @ 125 MHz after P&R with 16 bit datapath and 4kBytes BRAM
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

http://nibz.googlecode.com VHDL, no software yet, still in optimization phase for logic area minimization. 

 

Currently >280 min. LEs on MAX II. (16 bit) (23% of 1270) 

Many compile warnings. 

Fully parametric from 5 bit to n bit. 

17 operations, all single memory access, supports late read. 

No interupts at present. 

Optimized for stack style programming, direct threaded code. 

BSD licence. 

5 register model. double word shift left support. 

2 stacks, program counter, and 2 working registers. 

Supports WISHBONE Classic Timing when RRD_I = '0'. 

Supports late reads when RRD_I = '1'. 

Support little-endian half width data transfer using HLF_I multiplexer 

Secondary DMA port 

Interrupt strategy and timing slack options in development 

 

Any comments welcome. 

 

Get dac2.vhd for FREE, phase ultrasonic 1 bit DAC (16 bits in) from the nibz homepage. Does not yet use DMA bus. 

 

Plans for future: 

Various I/O 

SDK 

UFM and Video 

Spectral and Wave Table Audio Synthesis 

 

Reason d'etre: 

Open Mini PC (OxPx) resource disenfranchisement
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

I have a suggestion about handling inputs that need quick responses. Try using TriMatrix arrays set up to do logical functions.  

 

The attached sketch shows how a small statemachine merged with the inputs could produce an address input to a second array which then drives the output. 

 

The 2 stage pipeline first stage generates code, then uses code to generate output. 

 

If further processing is required, then use it more for processing, not as a controller, offload the critical timing to arrays.  

 

The second attachment is a start at using arrays for a microprogrammmed computer. 

 

The possibilities for TriMatrix seem endless. 

 

Thanks to all for your time to read this.
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

Arrow Electronics offers the ARM Cortex-M1 which can be added to an SOPC Builder design in place of a NIOSII. Go to http://www.arrowdevtools.com and search for Cortex-M1.

0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

http://nibz.googlecode.com just placed up a quartus archive of a 94% of EPM570ZM100C6 MAX IIZ project. The processor should work, is an improvement. has 40 general IO pins and 16 bit RAM (static) memory bus. 

 

No pin placement done yet, a very tight fit. 

 

cheers 

jacko
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

Hi I am just checking in to see if there is still interest in this topic. 

I have a new kind of embedded processor that runs this trivial test case in 51 clock cycles. 

36 of the cycles are used by iterations of the for loop. 9 iterations at 4 clocks per. 

 

How does this compare to NIOS? 

 

 

 

main()  

//x = 0x00ff; // 15 | 255; 

//x = 1; 

if(x == 0)  

if(x == 0) y = 2 + 3 * 4 - 4;  

else y = 5; 

else y = 2*3-4; 

while(x == 6) // if x == 6  

x = x - 7; 

for(x = 0; x < 9; x = x + 1) 

y = 10; 

 

A crude but compilable .bdf shows 5 aluts and about 45K ram bits with slack of 8.189. 

*******Just realized a lot of nodes were synthesized away so this data is not correct. 

 

Am I on the right track?
0 Kudos
Altera_Forum
Honored Contributor II
1,742 Views

http://nibz.nibzx.co.uk 

 

Well, it's been awhile so here is the NiBZ latest. 

 

Sub VGA 256*288 video 

16 bit audio through 1 bit DAC 

SPI boot and SD SPI 

80MHz in max II 

2 (16 bit) MIPS per 10MHz 

3 to 1 opcode compression with in place execution 

Still only 70% of MAX II 1270 with video and audio and SPI 

Fully generic word size 

16 bit port (8 ins and 8 outs)
0 Kudos
Altera_Forum
Honored Contributor II
1,581 Views

http://code.google.com/p/nibz/downloads/detail?name=nibzx7.vhd&can=2&q= (http://code.google.com/p/nibz/downloads/detail?name=nibzx7.vhd&can=2&q=

 

A 16 bit generically programmable core (not fully tested) but many issues fixed, @75% of 1270 MAX II and 85MHz in C5.
0 Kudos
Reply