Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Honored Contributor I
775 Views

Nios2 instruction and data master ports could be variable width when cache is used?

Hi, 

 

I was thinking about the Nios2 architecture today and I have to ask why the Nios2 doesn't employ a variable width instruction and data master port when the Nios2 is configured to use cache? It seems that there could be an improved throughput between slow external memory and the Nios2 cache if wider width Avalon mm instruction and data master ports were possible? Certainly we understand why a conventional processor must employ a fixed width data bus, but with a soft-core processor maybe there could be benefits obtained from increased flexibility? For example, if we use a 64 bit wide SDRAM DIMM with the Nios2 we might actualy see reduced performance of the Nios2 when compared with a Nios2 interfaced with a 32 bit wide memory. 

 

Jeff
0 Kudos
5 Replies
Highlighted
Honored Contributor I
14 Views

I suspect it would be a lot of logic for a small gain in very few configurations. FPGA pins ar usually at a premium - so a 64bit external data bus would be unusual. 

 

There are a lot of other places where a relatively small amount of logic would improve performance in configurations that are more likely to be used.
0 Kudos
Highlighted
Honored Contributor I
14 Views

 

--- Quote Start ---  

I suspect it would be a lot of logic for a small gain in very few configurations. FPGA pins ar usually at a premium - so a 64bit external data bus would be unusual. 

 

There are a lot of other places where a relatively small amount of logic would improve performance in configurations that are more likely to be used. 

--- Quote End ---  

 

 

 

Since, presumably, this extra logic would be instantiated only in situations where it was useful, then then the issue occurs only for the person who maintains the Nios2 IP, and we wouldnt have concerns about excessive logic consumption in the FPGA. 

 

My naive perspective is that, when there is a cache in the Nios2, the logic for copying from memory to cache wouldnt be complex, and maybe supporting different instruction and data port widths wouldnnt be a burden. 

 

Jeff
0 Kudos
Highlighted
Honored Contributor I
14 Views

Furthermore, even with a half-rate controller for a DDR 16 data pin interface, we can end up with a 64 bit avalon memory mapped slave.

0 Kudos
Highlighted
Honored Contributor I
14 Views

Both the instruction and data cache Avalon cycles are fed through the same Avalon master interface as uncached data cycles. What you are talking about would probably require 2 or 3 separate avalon master interfaces (since you wouldn't want a 64bit bus for the uncached data accesses). 

 

You'd then need to bridge the 32bit avalon bus to the 64bit one in order to allow uncached data accesses to memory. 

This is getting more and more logic - and that will slow things down further. 

 

More useful would be internal logic eg: 

- don't stall the cpu on memory read until the value is needed. 

- post avalon writes. 

- post cache writes waiting for cache line read. 

- predictive instruction cache line reads. 

- predictive data cache line reads. 

- read target address for branches (dual port instruction memory into cpu) in case branch taken. 

 

I actually suspect there is very little development of the nios cpu going on now, it is actually quite a few years old and fpga are quite a lot larger than they were when it was designed. There have been some changes for running things like linux (especially to the gcc config), but nothing for small code sytems (which require a different set of optimisations). 

 

I have some code that has hard real-time constraints (it has to get around a loop in under 194 clocks). This means I had to minimise the worst case code paths - not speed up the common ones. I know where every cycle stall is!
0 Kudos
Highlighted
Honored Contributor I
14 Views

Thanks for that explanation.  

 

Yes, FPGAs are growing in size, perhaps employing dedicated Nios2 masters for different purposes in such situations might result in more logic, but also simpler, and possibly also faster in-parallel, logic. We might need to move up from the nios2/f to the nios2/ turbo model :-)
0 Kudos