Hi, now I profile some java code with VTune
I want to check whether some function are CPU-intensive or Data-intensive
for example, suppose one function that loads some data from array, and calculate arithmetic operation like this
result = (array[index+offset] ^ variable>>>=8) &0xff;
in this case, I think their elapsed time can be divided into two types; CPU-operation time (plus, exclusive-or, shift, etc) and Data-operation time (array[index+offset])
Can I get this kinds of information from VTune without modifying original source code?
You may apply <Advanced Hotspots> analysis type to your binary to get the hottest (CPU intensive) functions, then drilling down to the Source View to see the distribution of thread active time over your code at high language or assembly level.
Hi Alexey, thanks for your comment, I have more questions
In here, CPU time is equal to function elapsed time (or sum of elapsed time)
and VTune says the most time-consuming function (the function that has the biggest CPU time) is the top hottest function.
but in your comments, you said "hottest (CPU intensive) functions".
Did you mean hottest function is CPU-intensive?
I'm not sure the hottest function is always CPU-intensive, because there are many reasons that cause some functions to slow down.
Second question is related to CPU usage graph
The graph above is my analysis results.
Does that mean,
1. The sum of the time except for idle time (in this case, 46.671s) is consumed by CPU operation?
2. The idle time (92.109s) is consumed by other reason (e g. read data from disk)?
Although it seems a bit counter-intuitive, a processor is considered "busy" when it is stalled waiting for loads to complete.
"Hot spots" in this context are locations in the code that are frequently sampled. This may be because the code is executed a great many times or it may be because the execution of the code is quite slow (typically due to stalls on loads).
"idle" refers to time when there is no process scheduled on a core, not time when the core is waiting for memory references. Unlike memory references, IO transactions are typically asynchronous, so a slow IO transaction will often lead to "idle time", rather than to a "hot spot".
To determine whether a piece of code is "compute-intensive" or "data-intensive", one typically looks at Instructions Per Cycle or Cache Miss rates (particularly L2 and LLC cache miss rates).
Regarding <CPU usage Histogram> on The <Summary Page> the answers are both - yes. CPU usage characterizes you workload as a whole providing you with the breakdown of elapsed time (wall clock time spent to execute your program) w.r.t to your logical CPUs utilization. <Idle> means part of time that your program spent waiting for something i.e. not executing, at all, e.g. waiting for I/O completion or waiting for a signal on some synchronization object e.g. semaphore. <Poor> means amount of time your code run on less than 50% of compute resources e.g. if you have four logical CPUs and your program is a single thread then three other CPUs are not used by your application and utilization is around 25%. <Ok> is for 51-85% compute resource utilization and <Ideal> is for 86-100%. <Poor> doesn't exactly mean there is a problem in your code. It just means a possibility for your program to complete faster if you will manage to parallel your algorithm and utilize all four CPUs instead of only one. Elapsed time may be significantly shortened in this case.
Regarding hottest functions - yes, hottest functions are always CPU intensive because the CPU executes their code most of the time. Other thing is that CPU may not execute their code well because of some inefficiencies in the code itself e.g. cache misses, branch mispredictions, pipeline stalls etc (microarchitectural issues). To see if your code has microarchitectural issues you may use <General Exploration> analysis.
Moreover during hotspot code analysis you should pay attention to Front-End and Back-End pipeline stalls. Fpr example code which is waiting for completion of memory load will be Front-End bound and code which saturated SIMD FPU (FADD or FMUL units) will be Back-End stalled.