Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20725 Discussions

alt_read_flash too slow?

Altera_Forum
Honored Contributor II
1,292 Views

I have NIOS-II processor, compiled with Quartus 12.0 

I was wondering if the following performance is reasonable: 

When I call alt_read_flash and read an entire flash block (256KBytes) I measure the time it takes and I get: around 0.5 second. 

 

On the contrary, when the NIOS loads the firmware from the same flash it takes around 0.1 second. 

 

So it seems to me that my flash can be read in a greater speed than I experience using alt_read_flash. 

 

What do you think?
0 Kudos
10 Replies
Altera_Forum
Honored Contributor II
486 Views

Firstly I'd ensure that everything is compiled with -O2 or -O3. 

Are you doing a single read of 256k, or a loop containing smaller reads? 

You can calculate some of the overhead by comparing the differences between the times to read different sized blocks. 

If you want to know whether the flash read call is significant just call it twice!
0 Kudos
Altera_Forum
Honored Contributor II
486 Views

Everything is copiled with -O3 and I read a single 256K block. 

I'm going to try and build a clean no OS project which does only flash read and see if it changes.
0 Kudos
Altera_Forum
Honored Contributor II
486 Views

In an independent project the time it took to a whole block read is 441msec. Not much improvement.

0 Kudos
Altera_Forum
Honored Contributor II
486 Views

Maybe something is doing a sector erase - 500ms would be about right for that. 

I looked at the low level flash functions a few weeks ago, IMHO you want to use the lower level functions (about two levels down) so that you can write partial blocks (ok if the old data if 0xff - or rather if the write doesn't need to set any bits). 

Might be worth adding some trace to thiose to see what is taking all the time. 

Also see if a read of 128k actually takes 1/2 the time - ie is it a per byte cost or a fixed overhead.
0 Kudos
Altera_Forum
Honored Contributor II
486 Views

The erase function is supposed to happen only for write commands. So, I don't think it is the erase thing. 

And another thing: I tried reading 16 times the size of the flash, and the time was multiplied by 16.
0 Kudos
Altera_Forum
Honored Contributor II
486 Views

How big is your firmware being loaded at boot time? (I am guessing it is not close to 256KB). 

 

For example, if it is only 1/5th of 256KB, then you are seeing roughly equivalent performance and the answer to your question is "yes, it sounds about right". 

 

If 512KB/s is too slow and you want to make it faster, you would need to supply more information about your system / application.
0 Kudos
Altera_Forum
Honored Contributor II
486 Views

The firmware size is about 2.5MBytes so my calculation about the read speed during boot takes that into acount. 

The flash device I'm using is epcs128. After running signal tap, it seems that there is quite a big delay between each byte read. The amount of the delay does seem reasonable according to the read speed I'm experiencing. 

I wonder if it is possible to configure this delay in any way.
0 Kudos
Altera_Forum
Honored Contributor II
486 Views

 

--- Quote Start ---  

 

On the contrary, when the NIOS loads the firmware from the same flash it takes around 0.1 second. 

 

--- Quote End ---  

 

 

--- Quote Start ---  

The firmware size is about 2.5MBytes so my calculation about the read speed during boot takes that into acount. 

 

--- Quote End ---  

 

 

 

--- Quote Start ---  

The flash device I'm using is epcs128. 

--- Quote End ---  

 

 

 

2.5MB / 0.1s = 25MB/s 

 

DCLK fMAX of EPCS128 during fast read is 40MHz (which is only 5MB/s if you ignore overhead). 

 

So, something doesnt add up. Can you clarify? 

 

Since you're already using SignalTap, configure it to capture the bootloader sequence and see if it is getting significantly better read timing than your runtime performance. This will simply indicate if you have a problem that you might slightly improve through software optimization.
0 Kudos
Altera_Forum
Honored Contributor II
486 Views

The measurement of 0.1sec was for a single 256k block. So for 2.5 Mbytes it should be 1sec. 

Regarding the SignalTap, I'll try it and see if there is an actual improvement.
0 Kudos
Altera_Forum
Honored Contributor II
486 Views

Is the delay between the read cycles, or are the read cycles stretched? 

You should also be able to find the source for the copy loop, might be illuminating. 

If you run the code from tightly-coupled instruction memory then you can use signal tap to trace the instruction fetches - can be very informative since it also shows the cpu stalls. The pipeline delays do make it slightly 'interesting' to follow. 

From what I remember, a little bit more logic in the SPI block would make it a lot faster. 

I also did some quick sums and thought that a 100MHz nios could directly bit-bang EPCS almost as fast as it can go.
0 Kudos
Reply