- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a java function measured by vtune with single thread and multiple threads mode.
We see instructions# increased linearly in the multiple threads test, but clockticks not liner. It increase much more than the number of threads. Is this a problem?
Thanks
Andrew
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrew,
How many cores do you have? Ideally, the # of clockticks should not increase as long as each thread has a dedicated core. If you have a sufficient number of cores, you should look into where the clockticks are going. The clockticks are a measure for the time that is needed to execute the code. If the ratio "clockticks per instuctions" increases, this means that you are not using the cores as efficiently.
Kind regards
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Obviously we have less cores (2? ) than # of threads (10). Then we should see clockticks increased by 5 times, but we see 20 times. CPI also increases from 4 to 9.
These means we are not using core efficiently? But why instruction looks OK, it is increased by 10 times only.
I still confuse about clockticks and instruction, could you point me some where to learn more.
Regards
Andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrew,
The VTune-Book might be what you are looking for:
VTune Performance Analyzer Essentials
Measurement and Tuning Techniques for Software Developers
by James Reinders
In a simplified way, "instructions retired" is a measure for the amount of work the CPU is doing. "clockticks" tells you, how much time it needed.Asacurrent Intel CPUs can execute up to 4 instructions in parallel, the lower limit of "clockticks per instruction" is 0.25, but normally a well-tuned program has a cpi in the range of 0.5 to 1. There are various reasonswhy the core does not execute asmanyinstructions as it could theoretically. The most prominent reasons are cache misses and mispredicted branches.
If your doubling the number of instructions in your code by doing the same thing twice in 2 threads, the number of clockticks wouldexactly double under perfect circumstances. However, there might be issues like writing to the same cache line from 2 different threads (by writing to the same memory or by "false sharing"). In this case, the cache line would be bouncing between the cores and slowing the whole program down. A different reason might be that you are saturating the bus. And so on....
I suggest that you look into your program and identify the code line and data, where the cpi increases most. (Thetrue location might be in a nearby location of the source code.)
Kind regards
Thomas
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page