- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, everyone!
I want to profiling the CPU CYCLES of the main thread at various optimization levels of compiler.The platform is Q6600 andOS is FC9. However, I'm confused by the vtune profiling result. The result is as follows.
The execution time at O2 is 387.922s, and the CPU_CLK_UNHALTED.Core is 406291, the sample ration is 1/1600000. Butthe samebenchmark at O3 level, the execution timeis 381.387.The exeution time was improved by O3.Theoretically, theCPU_CLK_UNHALTED.core at O3 shouldless than that at O2. But the result is not the same as expected. My question is thatwhat's the reason of the above expriment result.
And I have already executed the program several times, but the CPU_CLK_UNHALTED.core is not stable.
The following result is another example.
Please look at the second result, the CPU_CLK_UNHALTED.core is abnormally small.
Iwant to know how to explain the experiment result.
By the way,the experiment platform is not shared with others. In other words,only the benchmak runs on the platform. And the program wasbind to a specified corebyusing linux affinity interface.
How about your advise?
Thanks a lot!
Jason
I want to profiling the CPU CYCLES of the main thread at various optimization levels of compiler.The platform is Q6600 andOS is FC9. However, I'm confused by the vtune profiling result. The result is as follows.
BenchMark | O-level | Total time | main thread cycles |
mcf-improve | O0 | 521.058 | 774384 |
mcf-improve | O1 | 400.873 | 595959 |
mcf-improve | O2 | 387.922 | 406291 |
mcf-improve | O3 | 381.387 | 566987 |
The execution time at O2 is 387.922s, and the CPU_CLK_UNHALTED.Core is 406291, the sample ration is 1/1600000. Butthe samebenchmark at O3 level, the execution timeis 381.387.The exeution time was improved by O3.Theoretically, theCPU_CLK_UNHALTED.core at O3 shouldless than that at O2. But the result is not the same as expected. My question is thatwhat's the reason of the above expriment result.
And I have already executed the program several times, but the CPU_CLK_UNHALTED.core is not stable.
The following result is another example.
BenchMark | O-level | Total time | main cycles |
mcf-ori0 | O0 | 659.156 | 981545 |
mcf-ori0 | O1 | 483.982 | 37158 |
mcf-ori0 | O2 | 477.62 | 710823 |
mcf-ori0 | O3 | 469.906 | 699293 |
Please look at the second result, the CPU_CLK_UNHALTED.core is abnormally small.
Iwant to know how to explain the experiment result.
By the way,the experiment platform is not shared with others. In other words,only the benchmak runs on the platform. And the program wasbind to a specified corebyusing linux affinity interface.
How about your advise?
Thanks a lot!
Jason
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jason,
You have to view result by entering processes report first, select one interest of process, then entering threads report.
You have to compare CPU_CLK_UNHALTED.core counts for "O2" and "O3", in process level first. Sometime compiler may adjust (move) workloads frommain thread to other threads.
I have no your .tb5 or .tb6 result file.Ifyou need more investigating helps, please attach them or submit an issue to https://premier.intel.comwith result files
Regards, Peter
P.S> I don't know if you used CPU affinity or Thread affinity in code
You have to view result by entering processes report first, select one interest of process, then entering threads report.
You have to compare CPU_CLK_UNHALTED.core counts for "O2" and "O3", in process level first. Sometime compiler may adjust (move) workloads frommain thread to other threads.
I have no your .tb5 or .tb6 result file.Ifyou need more investigating helps, please attach them or submit an issue to https://premier.intel.comwith result files
Regards, Peter
P.S> I don't know if you used CPU affinity or Thread affinity in code
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Peter
Sorry for response slowly. In the source code, I use CPU affinity interface to fix the thread to a specified core.
However,in my experiment,the command interface of Vtune was used.The command was as follows. And I get the thread-out.csv file by using the vtl view -threads -cpu command.
And the hotfunction's data was got fromvtl view -hf -mn command.
vtl activity -d 1000 -c sampling -o "-ec en= 'MEM_LOAD_RETIRED.L2_MISS':sa=100000 en='CPU_CLK_UNHALTED.CORE' :sa=1600000 en='INST_RETIRED.ANY':sa=1600000 en='BUS_TRANS_BURST.SELF':sa=100000 -sterm yes -cal no" -app ./$TESTAPP," $PARA >$reportdir/result-$myapp.txt" run
vtl view -threads -cpu -sea 11 -sum -cd ',' > $reportdir/thread-${myapp}.csv
vtl view -hf -mn $myapp -cpu -sea 11 -sum -cd ',' > $reportdir/hotspots-${myapp}.csv
-------------
read the result file into a csv
vtl -delete -all -f
And Now I have the source data from the thread-out.csv and hotspots-out.csv,but I havn't the .tb5 or tb6 file. If I send the above two files, is it OK?
Sorry for response slowly. In the source code, I use CPU affinity interface to fix the thread to a specified core.
However,in my experiment,the command interface of Vtune was used.The command was as follows. And I get the thread-out.csv file by using the vtl view -threads -cpu command.
And the hotfunction's data was got fromvtl view -hf -mn command.
vtl activity -d 1000 -c sampling -o "-ec en= 'MEM_LOAD_RETIRED.L2_MISS':sa=100000 en='CPU_CLK_UNHALTED.CORE' :sa=1600000 en='INST_RETIRED.ANY':sa=1600000 en='BUS_TRANS_BURST.SELF':sa=100000 -sterm yes -cal no" -app ./$TESTAPP," $PARA >$reportdir/result-$myapp.txt" run
vtl view -threads -cpu -sea 11 -sum -cd ',' > $reportdir/thread-${myapp}.csv
vtl view -hf -mn $myapp -cpu -sea 11 -sum -cd ',' > $reportdir/hotspots-${myapp}.csv
-------------
read the result file into a csv
vtl -delete -all -f
And Now I have the source data from the thread-out.csv and hotspots-out.csv,but I havn't the .tb5 or tb6 file. If I send the above two files, is it OK?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page