- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Zheng L. posted:
Hello everyone, I try to get some data from the Intel Xeon E5-2687W by using PCU. Beause of the project reason, we are mainly interested finding in how the multi-thread using QPI to get the reading from the PCI card may effect the system. However, The incoming data traffic of QPI are always 0 and the outgoing data traffic are always 0 too. And I get some weird reading to. Is that possible that the reading is wrong?
Also I have two screen shots of that, but I find that the forum can not upload the picture. Is there any way that I can upload the picture so someone can help me analyze that?
Thank you very much.
Might it be that you have a second instance of PCM running? It might also be one that was not cleanly shut down?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a screenshot of the complete output would be very helpful. Alternatively copy&paste it in the comment text.
--
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are running a recent version of Linux and using the "perf" interface to the QPI counters, there is an error in the definition of the predefined QPI events for cacheable and non-cacheable data blocks transferred. In both cases, the standard distributions fail to set the "extra bit" that is needed for those events. Fortunately, it can be set manually using the "perf" interface.
The reference for the Linux kernel patch is https://lkml.org/lkml/2013/8/2/482
To set the bit manually, note that the prededined event programmed with the command:
# perf -e "uncore_qpi_0/event=drs_data/"
Is the same as
# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
But it should be
# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"
This last command returns the expected number of data cache lines transferred when I run the STREAM benchmark in a cross-socket configuration. The same change to the event number causes the "ncb_data" event to return non-zero values as well, but I don't have a test case for that event.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Roman Dementiev (Intel) wrote:
a screenshot of the complete output would be very helpful. Alternatively copy&paste it in the comment text.
--
Roman
Hello Roman,
I aleady uploaded the screenshop, hope that you help me with me problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thomas Willhalm (Intel) wrote:
Zheng L. posted:
Hello everyone, I try to get some data from the Intel Xeon E5-2687W by using PCU. Beause of the project reason, we are mainly interested finding in how the multi-thread using QPI to get the reading from the PCI card may effect the system. However, The incoming data traffic of QPI are always 0 and the outgoing data traffic are always 0 too. And I get some weird reading to. Is that possible that the reading is wrong?
Also I have two screen shots of that, but I find that the forum can not upload the picture. Is there any way that I can upload the picture so someone can help me analyze that?
Thank you very much.
Might it be that you have a second instance of PCM running? It might also be one that was not cleanly shut down?
Hello, I aleady uploaded the screenshot. Hope that helps. I am sure that I only run one instance of the program when I take the screen shot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John D. McCalpin wrote:
If you are running a recent version of Linux and using the "perf" interface to the QPI counters, there is an error in the definition of the predefined QPI events for cacheable and non-cacheable data blocks transferred. In both cases, the standard distributions fail to set the "extra bit" that is needed for those events. Fortunately, it can be set manually using the "perf" interface.
The reference for the Linux kernel patch is https://lkml.org/lkml/2013/8/2/482
To set the bit manually, note that the prededined event programmed with the command:
# perf -e "uncore_qpi_0/event=drs_data/"
Is the same as
# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
But it should be
# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"This last command returns the expected number of data cache lines transferred when I run the STREAM benchmark in a cross-socket configuration. The same change to the event number causes the "ncb_data" event to return non-zero values as well, but I don't have a test case for that event.
Hello John,
I have not try perf yet. I just tried the PCM of Intel. I will try perf to get the some result too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
could you please post the whole output including all the messages in the beginning just from the program invocation. There should be a couple of diagnostic messages that help to understand the issue.
Thanks,
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Roman Dementiev (Intel) wrote:
could you please post the whole output including all the messages in the beginning just from the program invocation. There should be a couple of diagnostic messages that help to understand the issue.
Thanks,
Roman
[root@cybermech IntelPerformanceCounterMonitorV2.5.1]# ./pcm.x 10
Intel(r) Performance Counter Monitor V2.5.1 (2013-06-25 13:44:03 +0200 ID=76b6d1f)
Copyright (c) 2009-2012 Intel Corporation
Num logical cores: 16
Num sockets: 2
Threads per core: 1
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3100000000 Hz
Package thermal spec power: 150 Watt; Package minimum power: 65 Watt; Package maximum power: 230 Watt;
ERROR: Requested bus number 64 is larger than the max bus number 63
Can not access SNB-EP (Jaketown) PCI configuration space. Access to uncore counters (memory and QPI bandwidth) is disabled.
You must be root to access these SNB-EP counters in PCM.
Number of PCM instances: 2
Detected Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz "Intel(r) microarchitecture codename Sandy Bridge-EP/Jaketown"
EXEC : instructions per nominal CPU cycle
IPC : instructions per CPU cycle
FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)
L3MISS: L3 cache misses
L2MISS: L2 cache misses (including other core's L2 cache *hits*)
L3HIT : L3 cache hit ratio (0.00-1.00)
L2HIT : L2 cache hit ratio (0.00-1.00)
L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
READ : bytes read from memory controller (in GBytes)
WRITE : bytes written to memory controller (in GBytes)
TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP
0 0 0.12 1.43 0.08 1.00 110 K 4508 K 0.98 0.47 0.01 0.07 N/A N/A 36
1 0 0.06 1.29 0.05 1.00 96 K 3233 K 0.97 0.42 0.01 0.08 N/A N/A 33
2 0 0.02 1.20 0.01 1.00 22 K 894 K 0.97 0.46 0.01 0.08 N/A N/A 34
3 0 0.13 1.78 0.07 1.00 46 K 1806 K 0.97 0.68 0.00 0.03 N/A N/A 32
4 0 0.00 0.77 0.00 1.00 4006 55 K 0.93 0.64 0.04 0.10 N/A N/A 34
5 0 0.00 1.18 0.00 1.00 3611 68 K 0.95 0.50 0.03 0.10 N/A N/A 32
6 0 0.00 1.22 0.00 1.00 2468 54 K 0.95 0.41 0.02 0.08 N/A N/A 31
7 0 0.00 0.79 0.00 1.00 16 K 208 K 0.92 0.48 0.05 0.13 N/A N/A 34
8 1 0.00 1.18 0.00 1.00 48 K 230 K 0.79 0.47 0.10 0.08 N/A N/A 22
9 1 0.01 1.17 0.01 1.00 78 K 453 K 0.83 0.38 0.06 0.06 N/A N/A 23
10 1 0.00 1.39 0.00 1.00 7367 45 K 0.84 0.54 0.05 0.05 N/A N/A 23
11 1 0.00 0.74 0.00 1.00 1211 7967 0.85 0.34 0.09 0.13 N/A N/A 22
12 1 0.00 0.88 0.00 1.00 1002 5663 0.82 0.33 0.11 0.12 N/A N/A 22
13 1 0.00 0.96 0.00 1.00 818 4268 0.81 0.35 0.11 0.11 N/A N/A 22
14 1 0.00 0.96 0.00 1.00 779 3867 0.80 0.31 0.11 0.11 N/A N/A 23
15 1 0.00 1.19 0.00 1.00 7827 25 K 0.69 0.35 0.09 0.05 N/A N/A 21
-------------------------------------------------------------------------------------------------------------------
SKT 0 0.04 1.49 0.03 1.00 302 K 10 M 0.97 0.51 0.01 0.06 0.00 0.00 31
SKT 1 0.00 1.18 0.00 1.00 145 K 776 K 0.81 0.42 0.07 0.06 0.00 0.00 21
-------------------------------------------------------------------------------------------------------------------
TOTAL * 0.02 1.47 0.01 1.00 448 K 11 M 0.96 0.50 0.01 0.06 0.00 0.00 N/A
Instructions retired: 10 G ; Active cycles: 7274 M ; Time (TSC): 30 Gticks ; C0 (active,non-halted) core residency: 1.47 %
C1 core residency: 98.53 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 %
C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %
PHYSICAL CORE IPC : 1.47 => corresponds to 36.86 % utilization for cores in active state
Instructions per nominal CPU cycle: 0.02 => corresponds to 0.54 % core utilization over time interval
Intel(r) QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links):
QPI0 QPI1 | QPI0 QPI1
----------------------------------------------------------------------------------------------
SKT 0 0 0 | -2147483648% -2147483648%
SKT 1 0 0 | -2147483648% -2147483648%
----------------------------------------------------------------------------------------------
Total QPI incoming data traffic: 0 QPI data traffic/Memory controller traffic: -nan
Intel(r) QPI traffic estimation in bytes (data and non-data traffic outgoing from CPU/socket through QPI links):
QPI0 QPI1 | QPI0 QPI1
----------------------------------------------------------------------------------------------
SKT 0 9223372 T 9223372 T | -2147483648% -2147483648%
SKT 1 9223372 T 9223372 T | -2147483648% -2147483648%
----------------------------------------------------------------------------------------------
Total QPI outgoing data and non-data traffic: 0
----------------------------------------------------------------------------------------------
SKT 0 package consumed 549.47 Joules
SKT 1 package consumed 560.42 Joules
----------------------------------------------------------------------------------------------
TOTAL: 1109.89 Joules
----------------------------------------------------------------------------------------------
SKT 0 DIMMs consumed 0.00 Joules
SKT 1 DIMMs consumed 0.00 Joules
----------------------------------------------------------------------------------------------
TOTAL: 0.00 Joules
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Roman Dementiev (Intel) wrote:
could you please post the whole output including all the messages in the beginning just from the program invocation. There should be a couple of diagnostic messages that help to understand the issue.
Thanks,
Roman
Hi Roman,
I already copied all the output, hope that will be more helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. This output is helpful. Could you please send the output of lspci command (as root) for further diagnosis?
--
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also could you specify the vendor of the system (Dell/HP/etc) and the BIOS vendor/version (seen in the output of Linux "dmidecode" command)?
Thanks,
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Roman Dementiev (Intel) wrote:
Also could you specify the vendor of the system (Dell/HP/etc) and the BIOS vendor/version (seen in the output of Linux "dmidecode" command)?
Thanks,
Roman
Hi Roman,
The system is Dell, and I upload output of the dmidecode command as an attachment, dmidecode.txt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks a lot for the detailed output. I have developed a patch that will allow you to show memory bandwidth from memory controller (attached). Apply it using "patch < bus_dell_patch.txt".
According to lspci output your BIOS hides the QPI performance monitoring devices therefore QPI statistics can not be available (you should see a message similar to one decribed in this article). Could you please try to find a newer BIOS for your system install it and try PCM again? It could be that you need to enable QPI/perfmon/PCM option (or similar) in the BIOS to unhide the QPI performance monitoring devices. If you can't find the option you might ask the vendor to provide such option.
Best regards,
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Roman Dementiev (Intel) wrote:
thanks a lot for the detailed output. I have developed a patch that will allow you to show memory bandwidth from memory controller (attached). Apply it using "patch < bus_dell_patch.txt".
According to lspci output your BIOS hides the QPI performance monitoring devices therefore QPI statistics can not be available (you should see a message similar to one decribed in this article). Could you please try to find a newer BIOS for your system install it and try PCM again? It could be that you need to enable QPI/perfmon/PCM option (or similar) in the BIOS to unhide the QPI performance monitoring devices. If you can't find the option you might ask the vendor to provide such option.
Best regards,
Roman
Hello Roman,
Thank you very much for the help. I really apprieated it. I updated the BIOS of Dell T7600 from A5 to A9. You can see the attachement that I attached () dmidecode_after_BIOS_update.txt. I compared it with the previous one. It seems that there are not too many changes. The diff_result.txt will show the difference between the previous dmidecoe result and the result after the BIOS update.
I updated the PCM by using the your patch. Thanks for you patch. I run the PCM, the PQI reading still have the problem. I attached the new reading too (PCM_output.txt).
I checked the setting in the BIOS, I did not find anything that is related to setting the QPI. In our BIOS setting, we did disable the HyperThread, Intel TurboBoost to make things easier for our project. I think will not effect the QPI result of PCM. Am I right?
You said that If I can't find the option you might ask the vendor to provide such option, and I will try that. Really grateful to your help.
Zheng Luo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This version of BIOS seems to be much better. Your BIOS settings changes should not impact QPI perfmon device visibility. Could you please send me lspci output?
I noticed in your PCM output the line "Number of PCM instances: 2". It may happen that you run two instances of PCM or sometime ago a PCM instance has been killed unexpectedly such that PCM instance counting becomes broken (this may also break QPI statistics display in this version of PCM: to be fixed in the next release). To clean the instance counting please do the following:
1. stop all pcm instances
2. as root: rm -rf /dev/shm/sem.*Intel*
3. start pcm again
Please share the output of pcm started after you did these operation.
Thanks for your cooperation,
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When we got the BIOS update from Dell that supported QPI performance counter access, the default setting was to not enable access. We had to select an option to enable QPI performance counter access and then reboot. Fortunately this only needed to be done once.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John D. McCalpin wrote:
When we got the BIOS update from Dell that supported QPI performance counter access, the default setting was to not enable access. We had to select an option to enable QPI performance counter access and then reboot. Fortunately this only needed to be done once.
Hello John,
Are you using the same computer as me ? I am using Dell T7600, what model are you using? Even after the update, I can still not see the QPI option. If you are using the same model as me, can you tell me where I can find the option in the BIOS that can turn on the QPI? Thank you very much.
Zheng Luo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Roman Dementiev (Intel) wrote:
This version of BIOS seems to be much better. Your BIOS settings changes should not impact QPI perfmon device visibility. Could you please send me lspci output?
I noticed in your PCM output the line "Number of PCM instances: 2". It may happen that you run two instances of PCM or sometime ago a PCM instance has been killed unexpectedly such that PCM instance counting becomes broken (this may also break QPI statistics display in this version of PCM: to be fixed in the next release). To clean the instance counting please do the following:
1. stop all pcm instances
2. as root: rm -rf /dev/shm/sem.*Intel*
3. start pcm again
Please share the output of pcm started after you did these operation.
Thanks for your cooperation,
Roman
Thank you very much Roman,
Now the PCM seems working now, at least there is no arbitrary value in the regarding to QPI reading. However I tried to use the commands mentioned in http://software.intel.com/en-us/forums/topic/280235#comment-1755207 to generated some traffic for QPI. The commands that I used are numactl --cpunodebind=0 --membind=0 ./lat_mem_rd -t 1024 and numactl --cpunodebind=0 --membind=1 ./lat_mem_rd -t 1024. The strange thing is that all the data in the QPI field is zero, I don't why? Is that because those command does not generate any traffic for the QPI. If so, how can I generate traffic for the QPI?
By the way, how do you close the PCM instance? I just use Ctrl + C to close the program...
Zheng Luo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are running Dell DCS8000 systems, so the BIOS is likely different than yours. I don't know what command was used to enable the PCI configuration space areas for the QPI counters and was not able to find it in the list of options that I checked. It is probably best to follow up with Dell.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page