Intel® FPGA University Program
University Program Material, Education Boards, and Laboratory Exercises
1174 Discussions

Bare metal application on Cyclone V too slow - DE10-Nano

Altera_Forum
Honored Contributor II
2,454 Views

Hello all, 

 

I'm trying to develop some bare metal application in C using DS-5 with arm-altera-eabi-gcc with the DE10-Nano (ARM Cortex A9) but after some time developing the code I realized that my program is running too slow when compared to run the same code on linux/LXDE for this board. The program is simple, I have an image of 640x480 in the SDRAM and I created a function to get the pixel level (1 Byte) at time what means 307.200 bytes to read and it takes almost 140 ms to execute and print the "." as in the code below. Is there some tricky that I need to do to speedup the code with caches / anything related or this time it's normal because I'm using just one core of ARM-A9? :confused::confused: 

 

# include <assert.h>#include <stdbool.h># include <stdio.h># include <stdlib.h> # include "alt_clock_manager.h"# include "alt_generalpurpose_io.h"# include "alt_globaltmr.h"# include "hwlib.h"# include "socal/alt_gpio.h"# include "socal/hps.h"# include "socal/socal.h"# include "hps_0.h"# include "system_crios.h"# include "mser.h" int main(void) { setup_system(); mserInit(); // drawTestImage(); uint32_t test = 0; uint8_t test2; while (1) { //mserFindRegions(); //delay_us(ALT_MICROSECS_IN_A_SEC/10); for(test = 0; test < 307200; test++) test2 = getPixelLevel(test); printf("\n\r."); } return 0; }
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
648 Views

First of all I'll make sure sdram access is cached. 

Try to read bytes from sdram and extract the bit afterwards with something like this: 

for(test = 0; test < 307200; test+= 8) { pixel8 = get8PixelLevels(test); for (bit = 0; bit<8; bit++) { test2 = pixel8 & 1; pixel8 >>= 1; } } 

If this way you experience a great improvement in speed, then the problem is with sdram access.
0 Kudos
Altera_Forum
Honored Contributor II
648 Views

Hello Cris72, 

 

Thanks for the answer but to check if it's running in the Cache I need to see the registers in a debug view? So, my getPixel function is similar to yours...another question do you have an email?
0 Kudos
Altera_Forum
Honored Contributor II
648 Views

 

--- Quote Start ---  

 

Thanks for the answer but to check if it's running in the Cache I need to see the registers in a debug view? 

 

--- Quote End ---  

 

No. Debug view is not required. Just check if your execution time improves with this change 

 

 

--- Quote Start ---  

 

So, my getPixel function is similar to yours. 

 

--- Quote End ---  

 

Nope. Your function returns a single pixel, mine 8. In an uncached system my code is faster, since the bottleneck is the number of sdram accesses. 

Each sdram access involves a lot of delays (RAS, CAS, ...) while the bit extraction is made at cpu-register level.  

In your case you make 7 out of 8 duplicate accesses to the same sdram address, so you lose a lot of time if the system is rather dumb and it accesses sdram every time as if it was a random address.  

Actually I don't know how processor bus handles sdram access, that's why I suggest this simple code trick to verify if you have any improvement.
0 Kudos
Altera_Forum
Honored Contributor II
648 Views

Do you compile with optimisations (-O2) ? 

Does your preloader execute from flash and set the CPU to it nominal frequency ? (if you are using SDRAM then my guess is yes because you need a properly configured preloader to set up the SDRAM parameters). You should see a line like that from the preloader: 

CLOCK: MPU clock 800 MHz
0 Kudos
Altera_Forum
Honored Contributor II
648 Views

where is your stdout? on a serial port? If so, what is the speed?

0 Kudos
Reply