Utilizing the multi-core control property of Fedora Core6-64bit, I can enable or diable certain cores, and do some measurement on different core configuration. But the problem is, when running the same benchmark, the INST_RETIRED.ANY measured with different config varied a lot, although they should be almost the same.
Benchmark CoreConfig INST_RETIRED.ANY
splash-ocean 1core 3.1E+10
For 2core and 4core measurement, the measured numbers were mostly less that the 1core result, with other benchmarks, the situation is the same.Then is there any special setting that I need to notice to make VTune work well with multi-core processor?
I used following commands to enable or disable specific cores:
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu3/online
Any help is greatly appreciated.
What kind of benchmark do you run? is it multi-threaded version? - does it do a limited amount of work every time? Is it CPU bound or rather IO bound? Do you use affinity to bound to a processor?
VTune can show information about collected samples across all cpu-s - in GUI you want to push CPU button and there is a switch in CLI version for this. Could you please provide the data it shows for your experiments with 1,2,4,8 cores.
Could you also check both processors are running on the same freq.
We are running with OCEAN in the SPLASH2 Suite. It is a Multi-threaded program. All of the running on different numbers of cores have the same input size, therefore should have the similar INST_RETIRED. We didn't set thread affinity bound to all processors. We suppose that the OS will evenly spread the threads.
All the cores are running with the same frequency.
And I read the vtune.c which seems that it can only measure SMP. If there are different cores in two sockets, e.g., 3 cores are in socket 0 and 1 core is in socket 1. It may only measure 2 cores totally. Is this right?
Test result with ocean which showed the execution information on different cores:
Config Inst_Retired_Total Inst_Retired_Different_Processor
1core 3.1E+10 Processor0: 3.1E+10
2core-01 3.1446E+10 Processor0: 1.8E+7
2core-02 5.594E+9 Processor0: 5.594E+9
4core- 0124 2.336E+10 Processor0: 7.32E+8
According to the test, it seems that only part of the cores have been sampled when configured differently.
You can use simple workaround for this problem: while creating activity, you can specify CPU mask, using '-cm' option. For example:
vtl activity -c sampling -o "-cm=0,1,3" -app ....
This will force VTune to collect data only from 0,1 and 3rd cores.