Community
cancel
Showing results for 
Search instead for 
Did you mean: 

Cyclone V QSPI XIP Example Design

Cyclone V QSPI XIP Example Design




Introduction

This page presents a demo that shows a bare-metal application running from QSPI Flash, on a Cyclone V Soc Development Kit, without using SDRAM.

Some more details about the demo:

  • QSPI Flash (1MB window) is used for program and read only data
  • HPS OCRAM (64KB) is used for regular data storage
  • OCRAM is cleared at the beginning of the bare-metal application
  • Caches are demonstrated, including cache pre-loading and cache locking
  • Interrupts are demonstrated

Prerequisites

The following are required for this demo:

  • For running the demo:
    • Cyclone V Development Kit, rev E preferable
    • SoC EDS 15.1.1 (for Quartus Flash Programmer)
    • Serial terminal running on PC (TeraTerm for example)
  • For compiling the Preloader:
    • SoC EDS 15.1.1
  • For compiling and debugging the bare-metal application:
    • SoC EDS 15.1.1, including ARM DS-5

Deliverables

The following are included with this demo:

The sample application archive Altera-SoCFPGA-HardwareLib-XIP-CV-GNU.tar.gz contains the following:

Altera-SoCFPGA-HardwareLib-XIP-CV-GNU

debug-unhosted.ds - debugger script

Makefile - makefile

io.c - printf support code

mmu_tables.c - mmu tables code

xip_demo.c - main demo code

test_function_1.c - 256KB code size test function

test_function_2.c - 512KB code size test function

mmu_tables.h - mmu tables header

custom_reset.s - zero ocram code

check_mmu_tables.pl - check mmu placement script

generate_mmu_tables.pl - generate mmu tables script

xip_demo-mkimage.bin - application image

xip_demo.axf - application executable

preloader_xip - folder with precompiled Preloader ELF and bootable image

Running the Demo

In order to run the demo, the following steps need to be performed:


1. Configure the Board to boot from QSPI Flash by setting the BSEL jumpers accordingly:

BSEL0BSEL1BSEL2Description
LeftLeftLeft3.3 V SPI or quad SPI

2. Write the Preloader Image by starting an Embedded Command Shell and running the following commands:

quartus_hps -c 1 -o PV Altera-SoCFPGA-HardwareLib-XIP-CV-GNU/preloader_xip/preloader-mkpimage.bin

Note: if the flashing fails from the first try, please try it again. It may be related to the HPS booting while the Flash Programmer tries to access the QSPI.

3. Flash the Bare-metal application image by running the following command from an Embedded Command Shell:

quartus_hps -c 1 -o P -a 0x100000 Altera-SoCFPGA-HardwareLib-XIP-CV-GNU/xip_demo-mkimage.bin

Note: if the flashing fails from the first try, please try it again. It may be related to the HPS booting while the Flash Programmer tries to access the QSPI.

4. Boot the Board. After the images were flashed, connect with a serial terminal to the board, using 115,200-8-N-1 settings and power-cycle or reset the HPS. The following will be displayed on the console:

U-Boot SPL 2013.01.01 (May 02 2016 - 14:54:55)

BOARD : Altera SOCFPGA Cyclone V Board

CLOCK: EOSC1 clock 25000 KHz

CLOCK: EOSC2 clock 25000 KHz

CLOCK: F2S_SDR_REF clock 0 KHz

CLOCK: F2S_PER_REF clock 0 KHz

CLOCK: MPU clock 925 MHz

CLOCK: DDR clock 400 MHz

CLOCK: UART clock 100000 KHz

CLOCK: MMC clock 50000 KHz

CLOCK: QSPI clock 370000 KHz

RESET: COLD

SF: Read data capture delay calibrated to 3 (0 - 7)

SF: Detected N25Q512 with page size 65536, total: 67108864

Both SPI and serial NOR flash in XIP mode

Hello XIP World!

Running 256KB test_function_1() without caches enabled : 2625200 ticks.

Caches enabled

Preload 256KB test_function_1() in L2 cache : 2644716 ticks.

Running 256KB test_function_1() from preloaded cache : 18170 ticks.

Running 512KB test_function_2() : 5242806 ticks.

Running 512KB test_function_2() : 3511295 ticks.

Running 256KB test_function_1() from preloaded cache : 19656 ticks.

Global Timer Interrupt: 1 of 5

Global Timer Interrupt: 2 of 5

Global Timer Interrupt: 3 of 5

Global Timer Interrupt: 4 of 5

Global Timer Interrupt: 5 of 5

Done.

Re-compiling the demo

This section describes how to re-compile the demo if any changes are necessary.

Generating and Compiling the Preloader

The following steps are required in order to generate and compile the Preloader:

1. Start Embedded Command Shell

2. Use bsp-editor to generate a Preloader based on the Cyclone V GHRD from SoC EDS 15.1.1

3. In bsp-editor, perform the following changes to the default configuration parameters

  • Uncheck BOOT_FROM_SDMMC
  • Check BOOT_FROM_QSPI
  • Check SKIP_SDRAM
  • Uncheck WATCHDOG_ENABLE (sample application does not pet it)
  • Uncheck SDRAM_SCRUBBING
  • Uncheck SDRAM_SCRUB_REMAIN_REGION

4. Change current folder to the Generated Preloader

5. Edit the file generated/sdram/sdram_config.h to define the following macros to '0':

#define CONFIG_HPS_SDR_CTRLCFG_CTRLCFG_ECCEN (0)

#define CONFIG_HPS_SDR_CTRLCFG_CTRLCFG_ECCCORREN (0)

Note that this step is not necessary in case the hardware project does not enable SDRAM ECC.

6. Compile the Preloader using 'make' – this will bring in all source code

make

7. Clean the Preloader using the following command

make clean

8. Patch the Preloader source using the following command:

patch -p1 < <path_to_patch>/preloader_xip_15.1.1.patch

Note: do not save the patch in the Preloader folder, because it will interfere with the build process.

The output will be something similar to the following:

patching file uboot-socfpga/drivers/spi/cadence_qspi_apb.c

patching file uboot-socfpga/include/configs/socfpga_common.hHunk #1 succeeded at 667 (offset 5 lines).

9. Recompile the Preloader using ‘make’:

make


Note that the Patch contains the following changes:

  • File /uboot-socfpga/include/configs/socfpga_common.h
    • Enabled XIP by defining CONFIG_SPL_SPI_XIP
    • Enabled QSPI remap by defining CONFIG_SPL_SPI_XIP_REMAPADDR
  • File uboot-socfpga/drivers/spi/cadence_qspi_apb.c
    • Used new macro CONFIG_SPL_SPI_XIP_REMAPADDR to remap QSPI address just before jumping to QSPI.

Recompiling the Baremetal Application

The following steps are required in order to rebuild the sample application:

  1. Start an Embedded Command Shell
  2. Start ARM DS-5 AE by running the command ‘eclipse &’
  3. Select a new workspace (or reuse an existing one)
  4. Go to File -> Import -> General -> Existing Projects into Workspace and click ‘Next’
  5. Choose ‘Select archive file’ option and click the associate ‘Browse’ button
  6. Select the file ‘Altera-SoCFPGA-HardwareLib-XIP-CV-GNU.tar.gz’ and click ‘Open’
  7. Click ‘Finish’ to import the project.
  8. Go to Project->Build project. This will compile the project

Debugging the Demo

This section presents how to debug the demo using ARM DS-5 Altera Edition.

The following steps are necessary: 1. Import and compile the sample bare-metal application.

2. Flash the bare-metal application to the QSPI flash. This is required because DS-5 is not able to load the image directly into QSPI flash.

3. Go to Run -> Debug Configurations - > DS-5 Debugger and select the ‘Altera-SoCFPGA-HardwareLib-XIP-CV-GNU-Debug’ debug configuration

4. On the ‘Connection’ tab, click the Connection ‘Browse’ button and select the USB Blaster instance associated with the board to be debugged.

5. Click the ‘Debug’ button

6. The debugger will then do the following:

  • Reset the board
  • Download and run the Preloader executable
  • Load the bare-metal application image symbols
  • Run the bare-metal application up to the ‘main’ function

7. The debugger is now stopped at entry to the ‘main’ function, and regular debugging can be used. Note that all the breakpoints used will be hardware breakpoints, since the debugger cannot write to QSPI Flash. This is transparently done by the debugger, based on the following line in the debugger script:

# Disable SW Breakpoints for Flash Code Stepping

memory 0xFFA00000 +0x100000 nobp ro noverify

Demo Architecture

This section presents a little bit more details about the demo architecture.

Preloader

The Preloader included in the 15.1.1 SoC EDS release natively supports the XIP mode. However, it required a small change to allow the whole 1MB of QSPI Flash Window to be used (by default the first 256KB are used by the four Preloader images). In order to do that, the Cadence QSPI controller was programmed to use the remap feature, to point the XIP window to the 2nd MB of QSPI Flash. The modification is minor (4 lines of code) and will be included as a standard feature in the future. The Preloader patch file was included, to enable Preloader demo recompilation.

Boot Sequence

The standard boot sequence is used, with the BootROM loading the Preloader, then the Preloader loading the bare-metal application:

7/73/Xip-boot-flow.png

Memory Usage

The following table presents the linker sections that the bare-metal application uses. This setup was used for the following reasons:

  • QSPI image needs to start with the actual entrypoint (Preloader requirement)
  • MMU L1 table needs to be aligned to 16KB (ARM requirement)* MMU L2 table needs to be aligned to 1KB (ARM requirement)
  • MMU L1 table needs to refer to MMU L2 table address as a constant value (MMU tables in QSPI read only memory constraint – address needs to be known at compile time)
SectionStartSizeDescription
ram0xFFFF000064K-4KOnchip RAM. Minus 4KB for the PLL workaround.
qspi_rom_startup0xFFA0000016K – 64QSPI: Startup code, needs to be at the beginning of the image. Minus 64 bytes for Mkimage header.
qspi_rom_mmu_ttb10xFFA0400016KQSPI: L1 Translation Table
qspi_rom_mmu_ttb20xFFA080001KQSPI: L2 Translation Table
qspi_rom0xFFA084001M - 33KQSPI: Rest of it – code and constant data

Flash Layout

The demo uses only 2MB of the QSPI Flash:

  • 1st MB: Preloader Images
  • 2nd MB: Bare-metal Application Image

d/d9/Xip-flash-layout.png

The rest of the QSPI can be used for other purposes, such as storing FPGA configuration images that can be used by the Preloader to configure the FPGA.

Cache Settings

The following table presents the cache settings that were used.

AreaL1 CacheableL2 Cacheable
1MB QSPI FlashYesYes
64KB OCRAMYesNo
Rest of MemoryNoNo

Note that making OCRAM also L2 cacheable did not improve the speed of the system. That is because OCRAM has a similar speed with the L2 cache. However, making OCRAM L1 cacheable did make a significant improvement in execution speed.

MMU Translation Tables

The MMU tables are used to describe the cache settings for different memory areas. For this demo, the following Tables were used:

  • L1 Translation Table – describes memory like this:
    • 1MB cacheable section for QSPI window
    • 1MB L2 described page table for last MB of address space
    • 1MB non-cacheable sections for the rest of address space
  • L2 Translation Table for last MB of address space:
    • Large page – 64KB OCRAM as L1 cacheable
    • Large pages – for the rest of the 1MB area

Notes:

  • The MMU tables were generated using the included script – generate_mmu_tables.pl. The tables are human readable and editable, so the script is not really required. It was included for completeness.
  • The absolute placement of the MMU tables is checked by the script check_mmu_tables.pl that is called by the Makefile.

Lock Data in L2 Cache

The application preloads the desired piece of date to the L2 cache and locks it to improve the execution efficiency.

It uses the Lockdown by line feature of the L2 cache controller. When lockdown by line feature is enabled during a period of time, all newly allocated cache lines get marked as locked. The controller then considers them as locked and does not naturally evict them. Lockdown by line feature can be enabled by setting bit [0] of the Lockdown by Line Enable Register.

Two test functions were developed to test the cache lock feature:

  • void test_function_1(void) : contains 64K "nop" instructions (which take up 256 KB of memory)
  • void test_function_1(void) : contains 128K "nop" instructions (which take up 512 KB of memory)

The flow of the sample application is the following:

  1. Measure the duration of test_function_1() before enabling the caches
  2. Enable caches and
  3. Preload test_function_1() in L2 cache
  4. Measure the duration of test_function_1() again, to see the effect of cache preloading
  5. Run test_function_2() and measure duration - withoud preloading, this should remove test_function_1 from cache
  6. Run test_function_2() again and measure duration - it will be faster because part of it will be loaded in cache
  7. Run test_function_1() again and measure duration - it will be almost the same since it is preloaded
Version history
Revision #:
1 of 1
Last update:
‎06-25-2019 04:27 PM
Updated by:
 
Contributors