Profile events on xeon by using perf

Iris_L_ · ‎03-17-2016

Hi,

I am doing some experiments using xeon and make a comparison between xeon and AMD, I am using perf in both machines. My concern is that the results of my events in xeon are thousand times higher then the results from AMD, but the runtime on xeon is much better than the AMD. I am measuring cache, instructions and cpu-clock in both machines.

My application is a matrix multiplication, size 1000x1000 and I am running a sequential execution, not parallel yet.

Can you explain why these differences (thousand times) between the machines? for example, in AMD caches-references = 713,432 and xeon is 127,708,365 caches-references.

this is the xeon configuration

Model: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
CPU MHz: 1200.000
CPU cores per Processor: 8
Host Physical Memory: 65933 MB
Architecture: x86_64
Host Physical Memory: 65933 MB
L1 dcache: 32K
L1 icache: 32K
L2 cache: 256K
L3 cache: 20480K
cache_alignment: 64

this is AMD configuration

AMD Opteron 2427

Instruction set: x86-64
Speed: 2.2 Ghz
L1 instruction cache: 6 x 64 Kb
L1 data cache: 6 x 64 Kb
L2 cache: 6 x 512 Kb
L3 cache: 6 Mb

take a look in this example,

=== results from AMD ======

perf stat -e cache-references,cache-misses,branch-instructions,cpu-clock bpsh 15 ./mm1 1000 1

Program runs in 17.52 seconds

Performance counter stats for 'bpsh 15 ./mm1 1000 1':

713,432 cache-references

35,538 cache-misses # 4.981 % of all cache refs

411,916 branch-instructions

2.701875 cpu-clock (msec)

17.560428347 seconds time elapsed

=== results from Xeon ======

now, I compiled mm1 on xeon as offload, but there is no #pragma offload directive, so the code run intirely on xeon (processor)

perf stat -e cache-references,cache-misses,branch-instructions,cpu-clock ./mm1 1000 1

Program runs in 2.69 seconds

Performance counter stats for './mm1 1000 1':

127,708,365 cache-references

477,245 cache-misses # 0.374 % of all cache refs

507,201,088 branch-instructions

2701.183114 cpu-clock (msec)

2.701594439 seconds time elapsed

do you have any idea why the results are so different?

thanks,