Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5064 Discussions

Issue with multi-core measurements

I have been using VTune 8.0.3 Linux version to do some measurement on Intel Xeon 5355,and encountered following issue:

Utilizing the multi-core control property of Fedora Core6-64bit, I can enable or diable certain cores, and do some measurement on different core configuration. But the problem is, when running the same benchmark, the INST_RETIRED.ANY measured with different config varied a lot, although they should be almost the same.

For example:

Benchmark CoreConfig INST_RETIRED.ANY
splash-ocean 1core 3.1E+10
8core 3.1E+10
2core:0+1 3.1E+10
2core:0+2 1.6E+10
4core:0+1+3+5 2.4E+10
4core:0+2+4+6 1.6E+10

For 2core and 4core measurement, the measured numbers were mostly less that the 1core result, with other benchmarks, the situation is the same.Then is there any special setting that I need to notice to make VTune work well with multi-core processor?
I used following commands to enable or disable specific cores:
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu3/online

Any help is greatly appreciated.

Thanks, Grace
0 Kudos
3 Replies
Hi - this is interesting.

What kind of benchmark do you run? is it multi-threaded version? - does it do a limited amount of work every time? Is it CPU bound or rather IO bound? Do you use affinity to bound to a processor?

VTune can show information about collected samples across all cpu-s - in GUI you want to push CPU button and there is a switch in CLI version for this. Could you please provide the data it shows for your experiments with 1,2,4,8 cores.

Could you also check both processors are running on the same freq.

regards, Andrei

0 Kudos

We are running with OCEAN in the SPLASH2 Suite. It is a Multi-threaded program. All of the running on different numbers of cores have the same input size, therefore should have the similar INST_RETIRED. We didn't set thread affinity bound to all processors. We suppose that the OS will evenly spread the threads.

All the cores are running with the same frequency.

And I read the vtune.c which seems that it can only measure SMP. If there are different cores in two sockets, e.g., 3 cores are in socket 0 and 1 core is in socket 1. It may only measure 2 cores totally. Is this right?

Test result with ocean which showed the execution information on different cores:

Config Inst_Retired_Total Inst_Retired_Different_Processor
1core 3.1E+10 Processor0: 3.1E+10
2core-01 3.1446E+10 Processor0: 1.8E+7
Processor1: 3.1428E+10
2core-02 5.594E+9 Processor0: 5.594E+9
4core- 0124 2.336E+10 Processor0: 7.32E+8
Processor1: 1.4632E+10
Processor2: 7.998E+09

According to the test, it seems that only part of the cores have been sampled when configured differently.


0 Kudos


You can use simple workaround for this problem: while creating activity, you can specify CPU mask, using '-cm' option. For example:

vtl activity -c sampling -o "-cm=0,1,3" -app ....

This will force VTune to collect data only from 0,1 and 3rd cores.



0 Kudos