Community
cancel
Showing results for 
Search instead for 
Did you mean: 
explore_zjx
Beginner
90 Views

About CPU_CLK_UNHALTED.CORE event

Hello, everyone!

I want to profiling the CPU CYCLES of the main thread at various optimization levels of compiler.The platform is Q6600 andOS is FC9. However, I'm confused by the vtune profiling result. The result is as follows.
BenchMarkO-level Total timemain thread cycles
mcf-improveO0521.058774384
mcf-improveO1400.873595959
mcf-improveO2387.922406291
mcf-improveO3381.387566987

The execution time at O2 is 387.922s, and the CPU_CLK_UNHALTED.Core is 406291, the sample ration is 1/1600000. Butthe samebenchmark at O3 level, the execution timeis 381.387.The exeution time was improved by O3.Theoretically, theCPU_CLK_UNHALTED.core at O3 shouldless than that at O2. But the result is not the same as expected. My question is thatwhat's the reason of the above expriment result.

And I have already executed the program several times, but the CPU_CLK_UNHALTED.core is not stable.
The following result is another example.

BenchMarkO-level Total time main cycles
mcf-ori0O0659.156981545
mcf-ori0O1483.98237158
mcf-ori0O2477.62710823
mcf-ori0O3469.906699293

Please look at the second result, the CPU_CLK_UNHALTED.core is abnormally small.

Iwant to know how to explain the experiment result.

By the way,the experiment platform is not shared with others. In other words,only the benchmak runs on the platform. And the program wasbind to a specified corebyusing linux affinity interface.

How about your advise?

Thanks a lot!

Jason
0 Kudos
2 Replies
Eric_M_Intel2
Employee
90 Views

Hello,

Some of your vocabularyhere - I need to validate
When you say "main thread cycles"...
do you mean The Number of times Vtune took a sample multiplied by the Sample After #? or do you mean the # of samples? It would appear by the preciseness and sizeof the number you have reported you mean the number of samples?

When you say the "sample ration is 1/1600000" - do you mean the sample after # is 1600000. if so - why did you choose that #? or did you indicate the correct CPU? It seems the Q6600 is a 2.4 GHz CPU ... a sample after number of 2400000 would make it alot easier to convert samples to seconds. that being said...

I may have somes educated guesses as to why your numbers above are what they are - depending on I guessed correctly as to what you meant.

Given the Current #'s (and assuming you meant Samples vs Cycles)- The Total Cycles for O3are accounting for about 377s of your reported time of 381s =(calculated via Seconds = Samples * Sample After # / GHz)

Given the Current #'s (and assuming you meant Samples vs Cycles)- The Total Cycles for O2 are accounting for about 270s of your reported time of 381s

That would indicate to me that some of the time for that CPU is either in a halted state, or in the Idle loop of the OS, or running a different thread...

Can you show the Total # of cycles during the runtime of the Applciation foreachCPU - all threads?

If this is a multithreaded application I would run Locks and Waits Analysis and/or concurrencyanalysis of Intel VTune Amplifier XE 2011.Or you can useIntel VTune 9.1 for Windows (and use the Thread Profiler Command Line to collect Data in Linux)- to determine if this thread is waiting on other threads to complete.

I do not currently have an educated guess on your O1 number above for a seperate run. if I think of something I will add to this thread.

Regards,
Eric M
sun__lei
Beginner
90 Views

Eric W Moore (Intel) wrote:

Hi, I have post a new question "confusing result" in this forum. Could you help me?

Reply