Nios® II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.

Custom Linux is 1000x slower

Altera_Forum
Honored Contributor I
867 Views

I am using an Altera Cyclone V SoC dev board. 

 

 

I built a custom Linux platform using Yocto and I finally got everything working and in a state in which I could run my AOCL program... However my FPGA program now runs about 1000x slower than when I use a prebuilt image. I am getting the following error on boot and I am guessing that it is related to the poor performance: 

hwclock: can't open '/dev/misc/rtc': No such file or directory INIT: Entering runlevel: 5misc/rtc': No such file or directory  

 

There are no rtc files on the board as far as I can tell (find / -name "*rtc*" returns nothing). 

 

 

Two questions then: 

 

  1. Any ideas why I am taking a 1000x performance hit using my custom Linux build? 

  2. If my assumptions are correct in that the hwclock/rtc is the culprit, any ideas on how to rectify this? or add something to my Yocto build or device tree or something? 

0 Kudos
1 Reply
Altera_Forum
Honored Contributor I
115 Views

UPDATE: 

 

I've modified my my Linux image and fixed the issue with the `rtc`, however this did not help the performance! 

 

Some interesting behaviour though, I launch my AOCL kernel multiple times, the first launch has execution times that appear to be in the ideal range of `0 < t < 1000ms`, and all subsequent launches report times of almost exactly `t=1000ms`. This leads me to believe that there is a `1s` clock somewhere that is responsible for the performance hit. My thinking is the first kernel launch could fall anywhere in the clock cycle, and returns on the next edge, hence `0 < t < 1000`, whereas subsequent launches will occur on the clock edge, hence `t=1000ms`. 

 

I also tested the `vector_add` example on my custom image and the prebuilt image (16.1); with my custom image it reports variable kernel time on the order of ` 0 < t < 1000ms` like I was seeing with my kernel. Using the prebuilt image I get consistent performance around `t=8.5ms`.  

 

I would love to hear your thoughts and suggestions! :)
Reply