FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6343 Discussions

QSYS - Clock sensitiveness

Altera_Forum
Honored Contributor II
1,011 Views

Hello together! 

 

I recently started using soft-cores in FPGA application. The last few days I spend some time to make investigations regarding to the NIOS-II clock sensitiveness (frequency and duty cycle behaviour). I built up a system based on the DE0-nano board (illustrated in figure 1 from the attached file). The QSYS-System itself is built up pretty easy (figured in figure 2) and has the task to toggle a parallel input/output peripheral. The used embedded code is illustrated in Listing 1. One run through the while loop needs 12 clock cycles(determined by stepwise debugging and counting the steps). 

 

Following behaviour of the System is not 100 percent clear for me. By applying the 50MHz clk (direct feed through to the QSYS-System) leads to a output signal which has a frequency of 271kHz. That is a factor of approximately 183. Following questions arises and I hope some of you can help me to understand the Soft-Core system better. 

 

1. Investigations showed that the output signal frequency is the factor 183 smaller than the input frequency. I assume beside the amount of clock cycles which the software needs, the clock frequency of the Avalon-Bus has the main impact of the frequency reduction. Is that right? Depends the Avalon Bus frequency on the input frequency? 

 

2. Downloading the .ELF-File with the ECLIPSE was possible till a input clock frequency of ~3 MHz. Why is the JTAG (which has his own clock about 6MHz) functionality dependent on the clock signal of the qsys-system? 

 

I look forward to your reply. :-) 

 

 

Thank you & best regards 

Stefan
0 Kudos
4 Replies
Altera_Forum
Honored Contributor II
244 Views

First each Nios 2 instruction takes several clock cycles (6 if I'm not mistaken) and IIRC the Nios 2/e doesn't have any fancy optimizations like branch prediction, heavy pipelining etc... 

Then when you say that you counted the cycles using the debugger, was it when stepping through C code or assembly? One step in the C code could be several assembly instructions. 

If you don't have instruction or data cache you also need to take into account several clock cycles due to the DRAM latency. 

As for your last question it all depends on how the JTAG controller was implemented. As the JTAG interface uses a clock frequency that is not the same than the system frequency on the bus, some clock crossing logic needs to be implemented. If you assume that one clock is always higher than the other (sometimes with a factor 2 involved) that clock crossing logic is a lot easier to implement than if you have to take into account any possible case. IIRC the Nios 2 documentation states a minimal clock frequency that must be used on the JTAG debug module.
0 Kudos
Altera_Forum
Honored Contributor II
244 Views

Thank you for the detailed information, very helpful!!! 

 

 

--- Quote Start ---  

...Then when you say that you counted the cycles using the debugger, was it when stepping through C code or assembly? One step in the C code could be several assembly instructions. 

If you don't have instruction or data cache you also need to take into account several clock cycles due to the DRAM latency... 

--- Quote End ---  

 

 

I stepped through in assembly.  

So that leads to 12 instructions per while loop times 6 cycles per instructions times x cycles due to DRAM latency. (by the way: x is approximately 2.5) 

I have the feeling that i basically understand the delay, thank you again. 

 

Best regards 

Stefan
0 Kudos
Altera_Forum
Honored Contributor II
244 Views

By the way, are there any official documents from arm which handle this topic in a more detailed way? 

Thank you & Regards Stefan
0 Kudos
Altera_Forum
Honored Contributor II
244 Views

It's not really "times x" because the DRAM latency only intervenes on the cycles where the CPU accesses RAM. For each instructions it will be at least once to read the instruction itself, and one or several times for the data accesses, depending on the instructions (I don't remember right now if some Nios2 instructions can do several data accesses in the same instruction). The latency itself will vary from access to access, especially depending on how far the new access is from the old one. The DRAM datasheet should give more detailed values but it can be a bit cumbersome to figure out. Using instruction and data caches on the CPU will dramatically increase performance. 

Alternatively if you know how to use SignalTap, you can connect it to the CPU's Avalon masters to see what exactly the CPU is doing and how many clock cycles it takes.
0 Kudos
Reply