Community
cancel
Showing results for 
Search instead for 
Did you mean: 
explore_zjx
Beginner
80 Views

About CPU_CLK_UNHALTED.CORE event

Hello, everyone!

I want to profiling the CPU CYCLES of the main thread at various optimization levels of compiler.The platform is Q6600 andOS is FC9. However, I'm confused by the vtune profiling result. The result is as follows.
BenchMarkO-level Total timemain thread cycles
mcf-improveO0521.058774384
mcf-improveO1400.873595959
mcf-improveO2387.922406291
mcf-improveO3381.387566987

The execution time at O2 is 387.922s, and the CPU_CLK_UNHALTED.Core is 406291, the sample ration is 1/1600000. Butthe samebenchmark at O3 level, the execution timeis 381.387.The exeution time was improved by O3.Theoretically, theCPU_CLK_UNHALTED.core at O3 shouldless than that at O2. But the result is not the same as expected. My question is thatwhat's the reason of the above expriment result.

And I have already executed the program several times, but the CPU_CLK_UNHALTED.core is not stable.
The following result is another example.

BenchMarkO-level Total time main cycles
mcf-ori0O0659.156981545
mcf-ori0O1483.98237158
mcf-ori0O2477.62710823
mcf-ori0O3469.906699293

Please look at the second result, the CPU_CLK_UNHALTED.core is abnormally small.

Iwant to know how to explain the experiment result.

By the way,the experiment platform is not shared with others. In other words,only the benchmak runs on the platform. And the program wasbind to a specified corebyusing linux affinity interface.

How about your advise?

Thanks a lot!

Jason
0 Kudos
3 Replies
Peter_W_Intel
Employee
80 Views

Hi Jason,

You have to view result by entering processes report first, select one interest of process, then entering threads report.

You have to compare CPU_CLK_UNHALTED.core counts for "O2" and "O3", in process level first. Sometime compiler may adjust (move) workloads frommain thread to other threads.

I have no your .tb5 or .tb6 result file.Ifyou need more investigating helps, please attach them or submit an issue to https://premier.intel.comwith result files

Regards, Peter
P.S> I don't know if you used CPU affinity or Thread affinity in code
explore_zjx
Beginner
80 Views

Hi, Peter
Sorry for response slowly. In the source code, I use CPU affinity interface to fix the thread to a specified core.
However,in my experiment,the command interface of Vtune was used.The command was as follows. And I get the thread-out.csv file by using the vtl view -threads -cpu command.
And the hotfunction's data was got fromvtl view -hf -mn command.

vtl activity -d 1000 -c sampling -o "-ec en= 'MEM_LOAD_RETIRED.L2_MISS':sa=100000 en='CPU_CLK_UNHALTED.CORE' :sa=1600000 en='INST_RETIRED.ANY':sa=1600000 en='BUS_TRANS_BURST.SELF':sa=100000 -sterm yes -cal no" -app ./$TESTAPP," $PARA >$reportdir/result-$myapp.txt" run

vtl view -threads -cpu -sea 11 -sum -cd ',' > $reportdir/thread-${myapp}.csv

vtl view -hf -mn $myapp -cpu -sea 11 -sum -cd ',' > $reportdir/hotspots-${myapp}.csv
-------------
read the result file into a csv
vtl -delete -all -f

And Now I have the source data from the thread-out.csv and hotspots-out.csv,but I havn't the .tb5 or tb6 file. If I send the above two files, is it OK?
explore_zjx
Beginner
80 Views

I have submit an issue in https://premier.intel.com. Hope somebody give me some suggestions!
Reply