Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

why simplest code can cause high cache-references,mem-stores using perf stat

liang__zhang
Beginner
1,135 Views

Hi Everyone,

I write a simplest test code with infinite loop,then run it on a sever with Linux\ Xeon 6261 CPU,which is pin on a specific core. 

test code:

int main(int ac, char **av)
{

loop:
    goto loop;

    return 0;
}

Then,use perf stat to obseve the performance,but return with dramatic result:

    15.006456113      3,180,914,057      cycles                                                        (39.99%)
    15.006456113      3,171,163,384      instructions              #    1.00  insn per cycle           (49.99%)
    15.006456113              7,911      cache-misses              #   23.217 % of all cache refs      (49.99%)
    15.006456113             33,128      cache-references                                              (50.00%)
    15.006456113              1,411      LLC-load-misses           #   14.92% of all LL-cache hits     (50.01%)
    15.006456113              9,730      LLC-loads                                                     (50.01%)
    15.006456113                225      LLC-store-misses                                              (20.00%)
    15.006456113              1,573      LLC-store                                                     (20.00%)
    15.006456113                  0      mem-loads                                                     (29.99%)
    15.006456113            678,645      mem-stores                                                    (39.99%)

I am very confused of the big value of cache,LLC and mem, especially  678,645 of mem-stores.

 

Thanks and best regards.

 

 

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
1,135 Views

It is very important to be precise when asking questions in these forums....

  • I don't know what a "Xeon 6261 CPU" is....  Can you get the correct model number from the output of either "lscpu" or "cat /proc/meminfo" ?
  • The specific command used to launch "perf stat" is critical -- please include both the command used to invoke "perf stat" and the full output.
    • It is always a good idea to avoid counter multiplexing when you are getting started with performance counters.  You can usually collect 4 core performance counters without multiplexing.
  • Some features of "perf stat" vary from one OS release to another.  The output of "uname -a" is usually sufficient.

It is also important to understand that your operating system is doing a lot of "stuff" in the background.  Some of that "stuff" leaks into the counts for the user program.   In your output, cache-misses, cache-references, LLC-load-misses, LLC-loads, LLC-store-misses, and LLC-store are all extremely small -- the largest is 100,000x smaller than the cycle or instruction counts.   Operating system interference at this level is (for practical purposes) unavoidable and negligible.   The value for mem-stores is rather high, but without precise details of what you are counting and how you are counting it, it would just be a waste of time to spend time trying to understand it....

View solution in original post

0 Kudos
2 Replies
HadiBrais
New Contributor III
1,135 Views

What parameters did you pass to "perf stat"?

The cache-related events could be caused by code executed before and/or after the infinite loop that *appears* in the source code. This may include the C/C++ runtime initialization code.

You can remove the infinite loop and see how the event counts change. It may be useful to inspect the generated assembly code either way.

Note that the standard deviation is so high that the event counts are mostly useless. This is probably because the events are counted over a (short) period of time with highly undeterministic dynamic behavior.

0 Kudos
McCalpinJohn
Honored Contributor III
1,136 Views

It is very important to be precise when asking questions in these forums....

  • I don't know what a "Xeon 6261 CPU" is....  Can you get the correct model number from the output of either "lscpu" or "cat /proc/meminfo" ?
  • The specific command used to launch "perf stat" is critical -- please include both the command used to invoke "perf stat" and the full output.
    • It is always a good idea to avoid counter multiplexing when you are getting started with performance counters.  You can usually collect 4 core performance counters without multiplexing.
  • Some features of "perf stat" vary from one OS release to another.  The output of "uname -a" is usually sufficient.

It is also important to understand that your operating system is doing a lot of "stuff" in the background.  Some of that "stuff" leaks into the counts for the user program.   In your output, cache-misses, cache-references, LLC-load-misses, LLC-loads, LLC-store-misses, and LLC-store are all extremely small -- the largest is 100,000x smaller than the cycle or instruction counts.   Operating system interference at this level is (for practical purposes) unavoidable and negligible.   The value for mem-stores is rather high, but without precise details of what you are counting and how you are counting it, it would just be a waste of time to spend time trying to understand it....

0 Kudos
Reply