I'm using VTune Amplifier XE to profile video decoding application. I've used Hotspot analyze to find heavy functions CPU time and was suprising when 'hotspots by CPU usage' had shown Idle usage (gray blocks)as part of the functions CPU time:
Next, I had thought the Idle CPU usage comes from some multithreading/synchronization issues when a thread has Ready state but doesn't received CPU time. But, digging deeper I've found the Idle time comes from edge where the periodically called functions become active:
Doesthe Idle time result from measurement granulation (OS scheduller tick period)or some other measurement inaccuracyor the thread really doesnt receive CPU after it was ready certain time duration?
How Amplifier determinates CPU state (Idle or Running) at sample extraction? (By means of NtQuerySystemInformation, or some NtQueryInformationXXX?)
First at all, pleaseuse latest product - VTune Amplifier XE 2011 XE Update 6.
If you use Hotspots analysis - that is user-mode data collection, use OS Timer's ticks toprofile.
1. In your screen-shot, I guessthat the program has many threads but parallelism is not good ("Red bar": parallel-working-threads / cores < 50%)
2. I don't know why two reports of "Hotspots by CPU Usgae" have different results, for same app?
3. You can use the group of "Function/Thread/Call Stack" to verify CPU usage in each thread of hot function
4. Explain bar of "Idle", the reasons could be:
4.a. Hotfunction is active, but data is not ready (e.g. read from disk)
4.b. Hot functionisactive (and the state of thread is ready), but more other threads are running, so this threaddidn't getCPU time granted.
4.c.CPU time may spend on other system dlls, 3rd-party libraries (light-workin decode function, mayAPI only?)
4.d. Other situation?
5. I think that you can change "Call Stack Mode:" to "User/system functions", it will display more info
If you still can't explain results, please attach zip file which are for "result directory" - I would like to look into.
unsigned sum = 0;
for (unsigned i = 0; i < 0xfffff; i++)
sum += i;
} while (true);
I am willing to concede that sum loop complete faster 15ms (OS scheduler tick period) so it's CPU time would be more than indeed but why we see Idle usage? I don't believe the Idle time results from overheadinring 0when scheduler redispatches threads due to Sleep(20) and my thread doesn't receive CPU time being Ready, because it would be visible as kernel mode CPU load in task manager.
Could you explain this?Does it result from user mode sampling and tracing specific character?
I'am using VTune Amplifier XE 2011 Update 5. I didn't find Update 6 on Intel site.