- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, everyone!
I want to profiling the CPU CYCLES of the main thread at various optimization levels of compiler.The platform is Q6600 andOS is FC9. However, I'm confused by the vtune profiling result. The result is as follows.
The execution time at O2 is 387.922s, and the CPU_CLK_UNHALTED.Core is 406291, the sample ration is 1/1600000. Butthe samebenchmark at O3 level, the execution timeis 381.387.The exeution time was improved by O3.Theoretically, theCPU_CLK_UNHALTED.core at O3 shouldless than that at O2. But the result is not the same as expected. My question is thatwhat's the reason of the above expriment result.
And I have already executed the program several times, but the CPU_CLK_UNHALTED.core is not stable.
The following result is another example.
Please look at the second result, the CPU_CLK_UNHALTED.core is abnormally small.
Iwant to know how to explain the experiment result.
By the way,the experiment platform is not shared with others. In other words,only the benchmak runs on the platform. And the program wasbind to a specified corebyusing linux affinity interface.
How about your advise?
Thanks a lot!
Jason
I want to profiling the CPU CYCLES of the main thread at various optimization levels of compiler.The platform is Q6600 andOS is FC9. However, I'm confused by the vtune profiling result. The result is as follows.
BenchMark | O-level | Total time | main thread cycles |
mcf-improve | O0 | 521.058 | 774384 |
mcf-improve | O1 | 400.873 | 595959 |
mcf-improve | O2 | 387.922 | 406291 |
mcf-improve | O3 | 381.387 | 566987 |
The execution time at O2 is 387.922s, and the CPU_CLK_UNHALTED.Core is 406291, the sample ration is 1/1600000. Butthe samebenchmark at O3 level, the execution timeis 381.387.The exeution time was improved by O3.Theoretically, theCPU_CLK_UNHALTED.core at O3 shouldless than that at O2. But the result is not the same as expected. My question is thatwhat's the reason of the above expriment result.
And I have already executed the program several times, but the CPU_CLK_UNHALTED.core is not stable.
The following result is another example.
BenchMark | O-level | Total time | main cycles |
mcf-ori0 | O0 | 659.156 | 981545 |
mcf-ori0 | O1 | 483.982 | 37158 |
mcf-ori0 | O2 | 477.62 | 710823 |
mcf-ori0 | O3 | 469.906 | 699293 |
Please look at the second result, the CPU_CLK_UNHALTED.core is abnormally small.
Iwant to know how to explain the experiment result.
By the way,the experiment platform is not shared with others. In other words,only the benchmak runs on the platform. And the program wasbind to a specified corebyusing linux affinity interface.
How about your advise?
Thanks a lot!
Jason
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Some of your vocabularyhere - I need to validate
When you say "main thread cycles"...
do you mean The Number of times Vtune took a sample multiplied by the Sample After #? or do you mean the # of samples? It would appear by the preciseness and sizeof the number you have reported you mean the number of samples?
When you say the "sample ration is 1/1600000" - do you mean the sample after # is 1600000. if so - why did you choose that #? or did you indicate the correct CPU? It seems the Q6600 is a 2.4 GHz CPU ... a sample after number of 2400000 would make it alot easier to convert samples to seconds. that being said...
I may have somes educated guesses as to why your numbers above are what they are - depending on I guessed correctly as to what you meant.
Given the Current #'s (and assuming you meant Samples vs Cycles)- The Total Cycles for O3are accounting for about 377s of your reported time of 381s =(calculated via Seconds = Samples * Sample After # / GHz)
Given the Current #'s (and assuming you meant Samples vs Cycles)- The Total Cycles for O2 are accounting for about 270s of your reported time of 381s
That would indicate to me that some of the time for that CPU is either in a halted state, or in the Idle loop of the OS, or running a different thread...
Can you show the Total # of cycles during the runtime of the Applciation foreachCPU - all threads?
If this is a multithreaded application I would run Locks and Waits Analysis and/or concurrencyanalysis of Intel VTune Amplifier XE 2011.Or you can useIntel VTune 9.1 for Windows (and use the Thread Profiler Command Line to collect Data in Linux)- to determine if this thread is waiting on other threads to complete.
I do not currently have an educated guess on your O1 number above for a seperate run. if I think of something I will add to this thread.
Regards,
Eric M
Some of your vocabularyhere - I need to validate
When you say "main thread cycles"...
do you mean The Number of times Vtune took a sample multiplied by the Sample After #? or do you mean the # of samples? It would appear by the preciseness and sizeof the number you have reported you mean the number of samples?
When you say the "sample ration is 1/1600000" - do you mean the sample after # is 1600000. if so - why did you choose that #? or did you indicate the correct CPU? It seems the Q6600 is a 2.4 GHz CPU ... a sample after number of 2400000 would make it alot easier to convert samples to seconds. that being said...
I may have somes educated guesses as to why your numbers above are what they are - depending on I guessed correctly as to what you meant.
Given the Current #'s (and assuming you meant Samples vs Cycles)- The Total Cycles for O3are accounting for about 377s of your reported time of 381s =(calculated via Seconds = Samples * Sample After # / GHz)
Given the Current #'s (and assuming you meant Samples vs Cycles)- The Total Cycles for O2 are accounting for about 270s of your reported time of 381s
That would indicate to me that some of the time for that CPU is either in a halted state, or in the Idle loop of the OS, or running a different thread...
Can you show the Total # of cycles during the runtime of the Applciation foreachCPU - all threads?
If this is a multithreaded application I would run Locks and Waits Analysis and/or concurrencyanalysis of Intel VTune Amplifier XE 2011.Or you can useIntel VTune 9.1 for Windows (and use the Thread Profiler Command Line to collect Data in Linux)- to determine if this thread is waiting on other threads to complete.
I do not currently have an educated guess on your O1 number above for a seperate run. if I think of something I will add to this thread.
Regards,
Eric M
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Eric W Moore (Intel) wrote:
Hi, I have post a new question "confusing result" in this forum. Could you help me?

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page