- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm using a low speed grade (-C8) Cyclone V with HPS (single core ARM Cortex A9).
IC-5C SE B A4 U 19 C 8 S-TMP
The code development is done on bare-metal (freeRTOS) with MPL as bootloader. All boot configuration is generated through Quartus.
When running a benchmark (coremark) to validate our configuration, we get an incredibly low score of 380.
When running the same test on a dev board which is fitted with a dual core running at 800 MHz (Linux), we get a score of 5000.
Thus, a single core score of 2500.
Scaling down to our 600 MHz (75%) yields an expected value of 1875.
We have reviewed the configuration and everything seems to be OK
- NEON is on
- MMU is on (flat address space)
- branch prediction is on
- all caches are on
This particular benchmark is designed to test core performance (mostly arithmetical operations on small data). It means that all code and data will fit into the cache memory which discards L3/L4 as bottlenecks. Despite this we have reviewed the clocking scheme which looks like this:
MPU = 600 MHz
L3 MP = 150 MHz
L3 SP = 75 MHz
L4 MP/SP = 100 MHz
DDR = 400 MHz
I must be missing something very important here. Any ideas?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A quick update on this issue.
When compiled with gcc or armcc, the coremark benchmark yields expected scores. It's the new arm clang compiler which gives the low score. Thus, our problem is not related to hardware or clock configuration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
It seems this is a compiler optimization issue.
So you mentioned that when using the gcc or armcc, the performance is as expected, right? please correct me if I am wrong.
May I know which compiler did you use?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have opened a support ticket through our FAI in the meantime. The compiler is armclang (ARM compiler 6).
Here is the compiler output when using the -v flag. Maybe you can spot something odd in the trace.
Target: arm-arm-none-eabi
"C:\\intelFPGA\\18.1\\embedded\\ds-5\\sw\\ARMCompiler6.10.1\\lib\\tt_default\\bin\\armclang.exe"
-cc1
--tool_variant=altera
-triple armv7-arm-none-eabi
-emit-obj
-disable-free
-disable-llvm-verifier
-discard-value-names
-main-file-name stubs.c
-mrelocation-model static
-mthread-model posix
-menable-no-infs
-menable-no-nans
-fmath-errno
-fno-signed-zeros
-fno-trapping-math
-fdenormal-fp-math=preserve-sign
-ffinite-math-only
-masm-verbose
-mconstructor-aliases
-target-cpu cortex-a9
-target-feature -crc
-target-feature +dsp
-target-feature -dotprod
-target-feature -ras
-target-feature -hwdiv-arm
-target-feature -hwdiv
-target-abi aapcs
-fno-math-builtin
-mfloat-abi hard
-fallow-half-arguments-and-returns
-dwarf-column-info
-debug-info-kind=limited
-dwarf-version=4
-debugger-tuning=gdb
-v
-ffunction-sections
-fdata-sections
-coverage-notes-file [...]
-resource-dir "C:\\intelFPGA\\18.1\\embedded\\ds-5\\sw\\ARMCompiler6.10.1\\lib\\tt_default\\lib\\clang\\7.0.0"
-dependency-file [...]
-MT[...]
-sys-header-deps
-I [...]
-D [...]
-internal-isystem "C:\\intelFPGA\\18.1\\embedded\\ds-5\\sw\\ARMCompiler6.10.1\\bin\\..\\include"
-nobuiltininc
-O2
-std=c99
-fdebug-compilation-dir [...]
-ferror-limit 19
-fmessage-length 0
-fvisibility hidden
-fno-signed-char
-fdeclspec
-fobjc-runtime=gcc
-fno-common
-fdiagnostics-show-option
-fsuppress-licensing
-vectorize-loops
-vectorize-slp
-aggressive-jump-threading -o [...]
-x c [...]
ARM Compiler 6.10.1 -cc1 default target aarch64-arm-none-eabi
#include "..." search starts here:
#include <...> search starts here:
[...]
End of search list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The problem is caused by a bug in the hwlib library. When building with armclang (-O2), some inline assembler functions are omitted which results in wrong cache/mmu configuration.
As an example lets take a helper function from "alt_cache.c" file:
static __inline __attribute__((always_inline)) void sctlr_write_helper(uint32_t sctlr)
{
#if defined(__ARMCOMPILER_VERSION)
__asm("MCR p15, 0, %[sctlr], c1, c0, 0" : : [sctlr] "r" (sctlr));
#elif defined(__ARMCC_VERSION)
__asm("MCR p15, 0, sctlr, c1, c0, 0");
#else
__asm("MCR p15, 0, %0, c1, c0, 0" : : "r" (sctlr));
#endif
}
The armclang inline assembler requires the "volatile" keyword to be used like this:
static __inline __attribute__((always_inline)) void sctlr_write_helper(uint32_t sctlr)
{
#if defined(__ARMCOMPILER_VERSION)
__asm volatile ("MCR p15, 0, %[sctlr], c1, c0, 0" : : [sctlr] "r" (sctlr));
#elif defined(__ARMCC_VERSION)
__asm("MCR p15, 0, sctlr, c1, c0, 0");
#else
__asm("MCR p15, 0, %0, c1, c0, 0" : : "r" (sctlr));
#endif
}
Here is the link to the ARM documentation:
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page