Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
1,718 Views

Low coremark benchmark score on Cyclone V (ARM Cortex A9)

Hi,

 

I'm using a low speed grade (-C8) Cyclone V with HPS (single core ARM Cortex A9).

 

IC-5C SE B A4 U 19 C 8 S-TMP

 

The code development is done on bare-metal (freeRTOS) with MPL as bootloader. All boot configuration is generated through Quartus.

When running a benchmark (coremark) to validate our configuration, we get an incredibly low score of 380.

 

When running the same test on a dev board which is fitted with a dual core running at 800 MHz (Linux), we get a score of 5000.

Thus, a single core score of 2500.

Scaling down to our 600 MHz (75%) yields an expected value of 1875.

 

We have reviewed the configuration and everything seems to be OK

  • NEON is on
  • MMU is on (flat address space)
  • branch prediction is on
  • all caches are on

 

This particular benchmark is designed to test core performance (mostly arithmetical operations on small data). It means that all code and data will fit into the cache memory which discards L3/L4 as bottlenecks. Despite this we have reviewed the clocking scheme which looks like this:

 

MPU = 600 MHz

L3 MP = 150 MHz

L3 SP = 75 MHz

L4 MP/SP = 100 MHz

DDR = 400 MHz

 

I must be missing something very important here. Any ideas?

 

 

 

 

0 Kudos
4 Replies
Highlighted
Beginner
97 Views

A quick update on this issue.

 

When compiled with gcc or armcc, the coremark benchmark yields expected scores. It's the new arm clang compiler which gives the low score. Thus, our problem is not related to hardware or clock configuration.

0 Kudos
Highlighted
Employee
97 Views

Hello,

It seems this is a compiler optimization issue.

So you mentioned that when using the gcc or armcc, the performance is as expected, right? please correct me if I am wrong.

May I know which compiler did you use?

 

Thanks

0 Kudos
Highlighted
Beginner
97 Views

I have opened a support ticket through our FAI in the meantime. The compiler is armclang (ARM compiler 6).

Here is the compiler output when using the -v flag. Maybe you can spot something odd in the trace.

 

Target: arm-arm-none-eabi

 "C:\\intelFPGA\\18.1\\embedded\\ds-5\\sw\\ARMCompiler6.10.1\\lib\\tt_default\\bin\\armclang.exe

-cc1 

--tool_variant=altera 

-triple armv7-arm-none-eabi 

-emit-obj 

-disable-free 

-disable-llvm-verifier 

-discard-value-names 

-main-file-name stubs.c 

-mrelocation-model static 

-mthread-model posix 

-menable-no-infs 

-menable-no-nans 

-fmath-errno 

-fno-signed-zeros 

-fno-trapping-math 

-fdenormal-fp-math=preserve-sign 

-ffinite-math-only 

-masm-verbose 

-mconstructor-aliases 

 

-target-cpu cortex-a9 

-target-feature -crc 

-target-feature +dsp 

-target-feature -dotprod 

-target-feature -ras 

-target-feature -hwdiv-arm 

-target-feature -hwdiv 

-target-abi aapcs 

 

-fno-math-builtin 

-mfloat-abi hard 

-fallow-half-arguments-and-returns 

-dwarf-column-info 

-debug-info-kind=limited 

-dwarf-version=4 

-debugger-tuning=gdb 

-v 

-ffunction-sections 

-fdata-sections 

-coverage-notes-file [...]

-resource-dir "C:\\intelFPGA\\18.1\\embedded\\ds-5\\sw\\ARMCompiler6.10.1\\lib\\tt_default\\lib\\clang\\7.0.0

-dependency-file [...] 

-MT[...]

-sys-header-deps 

-I [...]

-D [...]

-internal-isystem "C:\\intelFPGA\\18.1\\embedded\\ds-5\\sw\\ARMCompiler6.10.1\\bin\\..\\include

-nobuiltininc 

-O2 

-std=c99 

-fdebug-compilation-dir [...] 

-ferror-limit 19 

-fmessage-length 0 

-fvisibility hidden 

-fno-signed-char 

-fdeclspec 

-fobjc-runtime=gcc 

-fno-common 

-fdiagnostics-show-option 

-fsuppress-licensing 

-vectorize-loops 

-vectorize-slp 

-aggressive-jump-threading -o [...]

-x c [...]

 

ARM Compiler 6.10.1 -cc1 default target aarch64-arm-none-eabi

#include "..." search starts here:

#include <...> search starts here:

 [...]

End of search list.

0 Kudos
Highlighted
Beginner
97 Views

Hi,

 

The problem is caused by a bug in the hwlib library. When building with armclang (-O2), some inline assembler functions are omitted which results in wrong cache/mmu configuration.

 

https://forums.intel.com/s/question/0D50P00004ABHFoSAP/missing-volatile-keyword-in-hwlib-inline-asse...

 

As an example lets take a helper function from "alt_cache.c" file:

 

static __inline __attribute__((always_inline)) void sctlr_write_helper(uint32_t sctlr)

{

#if defined(__ARMCOMPILER_VERSION)

__asm("MCR p15, 0, %[sctlr], c1, c0, 0" : : [sctlr] "r" (sctlr));

#elif defined(__ARMCC_VERSION)

__asm("MCR p15, 0, sctlr, c1, c0, 0");

#else

__asm("MCR p15, 0, %0, c1, c0, 0" : : "r" (sctlr));

#endif

}

 

The armclang inline assembler requires the "volatile" keyword to be used like this:

 

static __inline __attribute__((always_inline)) void sctlr_write_helper(uint32_t sctlr)

{

#if defined(__ARMCOMPILER_VERSION)

__asm volatile ("MCR p15, 0, %[sctlr], c1, c0, 0" : : [sctlr] "r" (sctlr));

#elif defined(__ARMCC_VERSION)

__asm("MCR p15, 0, sctlr, c1, c0, 0");

#else

__asm("MCR p15, 0, %0, c1, c0, 0" : : "r" (sctlr));

#endif

}

 

Here is the link to the ARM documentation:

https://developer.arm.com/docs/100067/latest/armclang-inline-assembler/inline-assembly-statements-wi...

0 Kudos