- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone,
I write a simplest test code with infinite loop,then run it on a sever with Linux\ Xeon 6261 CPU,which is pin on a specific core.
test code:
int main(int ac, char **av)
{
loop:
goto loop;
return 0;
}
Then,use perf stat to obseve the performance,but return with dramatic result:
15.006456113 3,180,914,057 cycles (39.99%)
15.006456113 3,171,163,384 instructions # 1.00 insn per cycle (49.99%)
15.006456113 7,911 cache-misses # 23.217 % of all cache refs (49.99%)
15.006456113 33,128 cache-references (50.00%)
15.006456113 1,411 LLC-load-misses # 14.92% of all LL-cache hits (50.01%)
15.006456113 9,730 LLC-loads (50.01%)
15.006456113 225 LLC-store-misses (20.00%)
15.006456113 1,573 LLC-store (20.00%)
15.006456113 0 mem-loads (29.99%)
15.006456113 678,645 mem-stores (39.99%)
I am very confused of the big value of cache,LLC and mem, especially 678,645 of mem-stores.
Thanks and best regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is very important to be precise when asking questions in these forums....
- I don't know what a "Xeon 6261 CPU" is.... Can you get the correct model number from the output of either "lscpu" or "cat /proc/meminfo" ?
- The specific command used to launch "perf stat" is critical -- please include both the command used to invoke "perf stat" and the full output.
- It is always a good idea to avoid counter multiplexing when you are getting started with performance counters. You can usually collect 4 core performance counters without multiplexing.
- Some features of "perf stat" vary from one OS release to another. The output of "uname -a" is usually sufficient.
It is also important to understand that your operating system is doing a lot of "stuff" in the background. Some of that "stuff" leaks into the counts for the user program. In your output, cache-misses, cache-references, LLC-load-misses, LLC-loads, LLC-store-misses, and LLC-store are all extremely small -- the largest is 100,000x smaller than the cycle or instruction counts. Operating system interference at this level is (for practical purposes) unavoidable and negligible. The value for mem-stores is rather high, but without precise details of what you are counting and how you are counting it, it would just be a waste of time to spend time trying to understand it....
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What parameters did you pass to "perf stat"?
The cache-related events could be caused by code executed before and/or after the infinite loop that *appears* in the source code. This may include the C/C++ runtime initialization code.
You can remove the infinite loop and see how the event counts change. It may be useful to inspect the generated assembly code either way.
Note that the standard deviation is so high that the event counts are mostly useless. This is probably because the events are counted over a (short) period of time with highly undeterministic dynamic behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is very important to be precise when asking questions in these forums....
- I don't know what a "Xeon 6261 CPU" is.... Can you get the correct model number from the output of either "lscpu" or "cat /proc/meminfo" ?
- The specific command used to launch "perf stat" is critical -- please include both the command used to invoke "perf stat" and the full output.
- It is always a good idea to avoid counter multiplexing when you are getting started with performance counters. You can usually collect 4 core performance counters without multiplexing.
- Some features of "perf stat" vary from one OS release to another. The output of "uname -a" is usually sufficient.
It is also important to understand that your operating system is doing a lot of "stuff" in the background. Some of that "stuff" leaks into the counts for the user program. In your output, cache-misses, cache-references, LLC-load-misses, LLC-loads, LLC-store-misses, and LLC-store are all extremely small -- the largest is 100,000x smaller than the cycle or instruction counts. Operating system interference at this level is (for practical purposes) unavoidable and negligible. The value for mem-stores is rather high, but without precise details of what you are counting and how you are counting it, it would just be a waste of time to spend time trying to understand it....
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page