- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been using VTune 8.0.3 Linux version to do some measurement on Intel Xeon 5355,and encountered following issue:
Utilizing the multi-core control property of Fedora Core6-64bit, I can enable or diable certain cores, and do some measurement on different core configuration. But the problem is, when running the same benchmark, the INST_RETIRED.ANY measured with different config varied a lot, although they should be almost the same.
For example:
Benchmark CoreConfig INST_RETIRED.ANY
splash-ocean 1core 3.1E+10
8core 3.1E+10
2core:0+1 3.1E+10
2core:0+2 1.6E+10
4core:0+1+3+5 2.4E+10
4core:0+2+4+6 1.6E+10
For 2core and 4core measurement, the measured numbers were mostly less that the 1core result, with other benchmarks, the situation is the same.Then is there any special setting that I need to notice to make VTune work well with multi-core processor?
I used following commands to enable or disable specific cores:
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu3/online
Any help is greatly appreciated.
Thanks, Grace
Utilizing the multi-core control property of Fedora Core6-64bit, I can enable or diable certain cores, and do some measurement on different core configuration. But the problem is, when running the same benchmark, the INST_RETIRED.ANY measured with different config varied a lot, although they should be almost the same.
For example:
Benchmark CoreConfig INST_RETIRED.ANY
splash-ocean 1core 3.1E+10
8core 3.1E+10
2core:0+1 3.1E+10
2core:0+2 1.6E+10
4core:0+1+3+5 2.4E+10
4core:0+2+4+6 1.6E+10
For 2core and 4core measurement, the measured numbers were mostly less that the 1core result, with other benchmarks, the situation is the same.Then is there any special setting that I need to notice to make VTune work well with multi-core processor?
I used following commands to enable or disable specific cores:
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 1 > /sys/devices/system/cpu/cpu3/online
Any help is greatly appreciated.
Thanks, Grace
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi - this is interesting.
What kind of benchmark do you run? is it multi-threaded version? - does it do a limited amount of work every time? Is it CPU bound or rather IO bound? Do you use affinity to bound to a processor?
VTune can show information about collected samples across all cpu-s - in GUI you want to push CPU button and there is a switch in CLI version for this. Could you please provide the data it shows for your experiments with 1,2,4,8 cores.
Could you also check both processors are running on the same freq.
regards, Andrei
What kind of benchmark do you run? is it multi-threaded version? - does it do a limited amount of work every time? Is it CPU bound or rather IO bound? Do you use affinity to bound to a processor?
VTune can show information about collected samples across all cpu-s - in GUI you want to push CPU button and there is a switch in CLI version for this. Could you please provide the data it shows for your experiments with 1,2,4,8 cores.
Could you also check both processors are running on the same freq.
regards, Andrei
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are running with OCEAN in the SPLASH2 Suite. It is a Multi-threaded program. All of the running on different numbers of cores have the same input size, therefore should have the similar INST_RETIRED. We didn't set thread affinity bound to all processors. We suppose that the OS will evenly spread the threads.
All the cores are running with the same frequency.
And I read the vtune.c which seems that it can only measure SMP. If there are different cores in two sockets, e.g., 3 cores are in socket 0 and 1 core is in socket 1. It may only measure 2 cores totally. Is this right?
Test result with ocean which showed the execution information on different cores:
Config Inst_Retired_Total Inst_Retired_Different_Processor
1core 3.1E+10 Processor0: 3.1E+10
2core-01 3.1446E+10 Processor0: 1.8E+7
Processor1: 3.1428E+10
2core-02 5.594E+9 Processor0: 5.594E+9
4core- 0124 2.336E+10 Processor0: 7.32E+8
Processor1: 1.4632E+10
Processor2: 7.998E+09
According to the test, it seems that only part of the cores have been sampled when configured differently.
Regards,
Grace
We are running with OCEAN in the SPLASH2 Suite. It is a Multi-threaded program. All of the running on different numbers of cores have the same input size, therefore should have the similar INST_RETIRED. We didn't set thread affinity bound to all processors. We suppose that the OS will evenly spread the threads.
All the cores are running with the same frequency.
And I read the vtune.c which seems that it can only measure SMP. If there are different cores in two sockets, e.g., 3 cores are in socket 0 and 1 core is in socket 1. It may only measure 2 cores totally. Is this right?
Test result with ocean which showed the execution information on different cores:
Config Inst_Retired_Total Inst_Retired_Different_Processor
1core 3.1E+10 Processor0: 3.1E+10
2core-01 3.1446E+10 Processor0: 1.8E+7
Processor1: 3.1428E+10
2core-02 5.594E+9 Processor0: 5.594E+9
4core- 0124 2.336E+10 Processor0: 7.32E+8
Processor1: 1.4632E+10
Processor2: 7.998E+09
According to the test, it seems that only part of the cores have been sampled when configured differently.
Regards,
Grace
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You can use simple workaround for this problem: while creating activity, you can specify CPU mask, using '-cm' option. For example:
vtl activity -c sampling -o "-cm=0,1,3" -app ....
This will force VTune to collect data only from 0,1 and 3rd cores.
regards,
Valeriy
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page