- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Is it possible to log following per code data for Intel Xeon Phi 7210 either using sysfs or by reading any specific MSR?
- Per Core Power
- Per Core Temperature
- Per Core Utilization
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Per core Power -- no
- Per core Temperature -- almost
- The MSR that normally provides this information at core scope is IA32_THEM_STATUS (19Ch).
- The MSR tables in the Intel Architecture SW Developer's manual say that this has "module" scope on Xeon Phi x200, where "module" refers to the pair of cores sharing an L2 cache. In most documentation this would be referred to as "tile" scope.
- Per Core Utilization -- yes
- It should be possible to measure this in three different ways, but the third one looks broken on Xeon Phi x200.
- All three require that you read the Time-Stamp Counter at the beginning and end of the measurement interval.
- In addition to the TSC, you need to read any of these three registers at the beginning and end of the measurement interval:
- Fixed-function counter 2 (IA32_FIXED_CTR2, MSR 30Bh).
- One of the programmable counters measuring the CPU_CLK_UNHALTED.REF (Event 3Ch, Umask 01h).
- The IA32_MPERF MSR (E7h).
- For the first two options, you can get the overall utilization (any thread active) by setting the AnyThread bit in the appropriate counter register -- then you only need to read the TSC and performance counter register using one thread on any physical core.
- For Fixed-function counter 2, you need to set bit 10 of IA32_FIXED_CTR_CTRL (38Dh) (in addition to bits 8-9). Bit 34 of IA32_PERF_GLOBAL_CTRL (38Fh) must also be set.
- For a programmable counter, you need to set bit 21 of the IA32_PERFEVTSEL register you are using (either 186h or 187h).
- For the third option (MPERF), the only option is thread scope, so you would need to read the MPERF register separately for each logical processor in the system.
- You still only need to read the TSC once per physical core.
- Unfortunately the APERF and MPERF registers appear to be broken on KNL?
- On every other processor I have tested, MPERF increments at the same rate as TSC on a 100% busy core.
- On Xeon Phi x200, both MPERF and APERF increment at slightly less than 1/1000th of the rate that I expect.
- All of the tests I have done show consistent results -- the counting appears correct, but with the wrong scaling factor.
- The ratio of delta(APERF)/delta(MPERF) matches the observed Turbo boost.
- Since the scaling factor is not documented, you can't rely on a computation that involves subtracting a scaled MPERF value from the delta TSC value.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Per core Power -- no
- Per core Temperature -- almost
- The MSR that normally provides this information at core scope is IA32_THEM_STATUS (19Ch).
- The MSR tables in the Intel Architecture SW Developer's manual say that this has "module" scope on Xeon Phi x200, where "module" refers to the pair of cores sharing an L2 cache. In most documentation this would be referred to as "tile" scope.
- Per Core Utilization -- yes
- It should be possible to measure this in three different ways, but the third one looks broken on Xeon Phi x200.
- All three require that you read the Time-Stamp Counter at the beginning and end of the measurement interval.
- In addition to the TSC, you need to read any of these three registers at the beginning and end of the measurement interval:
- Fixed-function counter 2 (IA32_FIXED_CTR2, MSR 30Bh).
- One of the programmable counters measuring the CPU_CLK_UNHALTED.REF (Event 3Ch, Umask 01h).
- The IA32_MPERF MSR (E7h).
- For the first two options, you can get the overall utilization (any thread active) by setting the AnyThread bit in the appropriate counter register -- then you only need to read the TSC and performance counter register using one thread on any physical core.
- For Fixed-function counter 2, you need to set bit 10 of IA32_FIXED_CTR_CTRL (38Dh) (in addition to bits 8-9). Bit 34 of IA32_PERF_GLOBAL_CTRL (38Fh) must also be set.
- For a programmable counter, you need to set bit 21 of the IA32_PERFEVTSEL register you are using (either 186h or 187h).
- For the third option (MPERF), the only option is thread scope, so you would need to read the MPERF register separately for each logical processor in the system.
- You still only need to read the TSC once per physical core.
- Unfortunately the APERF and MPERF registers appear to be broken on KNL?
- On every other processor I have tested, MPERF increments at the same rate as TSC on a 100% busy core.
- On Xeon Phi x200, both MPERF and APERF increment at slightly less than 1/1000th of the rate that I expect.
- All of the tests I have done show consistent results -- the counting appears correct, but with the wrong scaling factor.
- The ratio of delta(APERF)/delta(MPERF) matches the observed Turbo boost.
- Since the scaling factor is not documented, you can't rely on a computation that involves subtracting a scaled MPERF value from the delta TSC value.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
Thanks for detailed information. If not per core, I guess per package power is possible?
- Below is the log I get after running turbostat. Based on which it seems PkgWatt and RAMWatt are specific MSR read by turbostat, is my understanding correct?
- Are you aware of any turbostat user guide on what each of the column header mean? I can guess frequency as Bzy_MHZ, but not sure about what CPU%c1 stands for.
Thanks.
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c6 CoreTmp PkgTmp Pkg%pc3 Pkg%pc6 PkgWatt RAMWatt PKG_% RAM_%
- 4 0.27 1286 1324 0 0.24 99.49 24 31 0.39 77.82 49.36 2.47 0.00 0.00
0 12 0.89 1289 1358 0 0.25 98.86 24 31 0.38 75.88 49.36 2.47 0.00 0.00
64 2 0.15 1240 1358 0 0.99
128 3 0.25 1273 1358 0 0.89
192 3 0.26 1274 1359 0 0.89
1 2 0.19 1210 1359 0 0.21 99.60 24
65 2 0.15 1221 1358 0 0.24
129 3 0.26 1254 1358 0 0.14
193 4 0.27 1252 1358 0 0.12
2 2 0.19 1239 1358 0 1.04 98.77 21
66 13 0.94 1289 1358 0 0.29
130 2 0.17 1252 1359 0 1.07
194 5 0.34 1276 1359 0 0.90
3 3 0.24 1220 1357 0 0.25 99.51 21
67 2 0.17 1214 1357 0 0.32
131 4 0.30 1253 1357 0 0.19
195 4 0.29 1256 1357 0 0.19
4 4 0.27 1246 1357 0 0.14 99.59 21
68 2 0.15 1244 1357 0 0.26
132 2 0.17 1261 1357 0 0.25
196 3 0.24 1274 1357 0 0.17
5 4 0.27 1254 1357 0 0.16 99.57 20
69 2 0.16 1228 1357 0 0.27
133 2 0.16 1239 1357 0 0.27
197 4 0.27 1262 1357 0 0.17
6 3 0.24 1259 1357 0 0.29 99.47 22
70 2 0.16 1242 1357 0 0.37
134 3 0.24 1269 1356 0 0.18
198 3 0.25 1270 1356 0 0.17
7 4 0.28 1251 1356 0 0.19 99.52 22
71 2 0.18 1231 1356 0 0.30
135 4 0.31 1267 1356 0 0.16
199 4 0.33 1267 1356 0 0.14
8 3 0.26 1268 1354 0 0.23 99.50 22
72 3 0.26 1268 1354 0 0.24
136 3 0.25 1276 1352 0 0.13
200 3 0.25 1276 1352 0 0.14
9 2 0.16 1223 1352 0 0.30 99.53 22
73 3 0.22 1254 1352 0 0.24
137 3 0.26 1264 1352 0 0.21
201 3 0.23 1259 1351 0 0.13
10 2 0.16 1245 1351 0 0.27 99.57 23
74 2 0.14 1247 1351 0 0.29
138 3 0.25 1276 1351 0 0.18
202 2 0.16 1258 1351 0 0.27
11 5 0.39 1267 1349 0 0.13 99.48 23
75 4 0.28 1266 1349 0 0.24
139 5 0.36 1272 1349 0 0.16
203 2 0.15 1240 1349 0 0.37
12 3 0.23 1262 1346 0 0.36 99.41 19
76 3 0.25 1272 1345 0 0.26
140 2 0.15 1258 1343 0 0.24
204 3 0.21 1270 1343 0 0.18
13 2 0.17 1220 1343 0 0.32 99.51 19
77 4 0.28 1264 1343 0 0.21
141 2 0.17 1247 1343 0 0.32
205 3 0.24 1258 1342 0 0.13
14 2 0.15 1243 1342 0 0.63 99.22 22
78 2 0.14 1249 1340 0 0.55
142 4 0.34 1281 1340 0 0.35
206 3 0.25 1274 1339 0 0.36
15 3 0.25 1250 1338 0 0.29 99.46 22
79 4 0.28 1262 1337 0 0.20
143 4 0.31 1266 1337 0 0.17
207 4 0.31 1271 1337 0 0.17
16 2 0.14 1241 1337 0 0.30 99.55 22
80 3 0.24 1273 1337 0 0.21
144 3 0.14 2417 1337 0 0.31
208 3 0.22 1271 1336 0 0.11
17 4 0.29 1258 1336 0 0.21 99.49 22
81 3 0.21 1255 1336 0 0.30
145 4 0.28 1271 1335 0 0.15
209 4 0.28 1262 1335 0 0.15
18 3 0.24 1264 1335 0 0.36 99.40 21
82 3 0.26 1276 1334 0 0.26
146 2 0.15 1259 1334 0 0.37
210 3 0.23 1273 1332 0 0.17
19 2 0.19 1234 1332 0 0.32 99.49 22
83 3 0.24 1256 1332 0 0.27
147 4 0.19 2005 1332 0 0.32
211 3 0.24 1261 1331 0 0.16
20 4 0.33 1264 1331 0 0.18 99.49 21
84 5 0.36 1277 1331 0 0.14
148 4 0.34 1274 1331 0 0.17
212 5 0.17 2891 1331 0 0.34
21 3 0.22 1216 1328 0 0.19 99.59 21
85 3 0.26 1242 1328 0 0.15
149 3 0.24 1241 1328 0 0.17
213 3 0.21 1191 1328 0 0.20
22 2 0.17 1247 1327 0 0.23 99.61 21
86 3 0.26 1274 1327 0 0.13
150 2 0.14 1249 1327 0 0.25
214 3 0.26 1275 1327 0 0.14
23 2 0.16 1224 1327 0 0.27 99.57 21
87 3 0.27 1266 1327 0 0.16
151 4 0.32 1269 1327 0 0.11
215 4 0.31 1271 1327 0 0.12
24 3 0.16 1992 1327 0 0.30 99.54 22
88 2 0.14 1256 1326 0 0.23
152 3 0.26 1275 1325 0 0.10
216 3 0.25 1274 1326 0 0.11
25 2 0.15 1215 1326 0 0.23 99.62 22
89 2 0.14 1231 1326 0 0.24
153 3 0.26 1262 1326 0 0.12
217 3 0.26 1261 1326 0 0.12
26 3 0.15 2002 1326 0 0.48 99.36 21
90 2 0.14 1255 1324 0 0.41
154 3 0.21 1269 1323 0 0.24
218 3 0.20 1266 1322 0 0.17
27 4 0.32 1260 1322 0 0.18 99.49 21
91 2 0.14 1267 1322 0 0.36
155 4 0.31 1269 1322 0 0.19
219 4 0.32 1271 1322 0 0.18
28 6 0.46 1255 1322 0 0.25 99.29 21
92 5 0.37 1278 1322 0 0.34
156 5 0.38 1280 1322 0 0.34
220 5 0.37 1277 1322 0 0.35
29 3 0.21 1226 1322 0 0.26 99.53 21
93 2 0.17 1235 1322 0 0.30
157 4 0.28 1260 1322 0 0.19
221 3 0.27 1256 1322 0 0.20
30 2 0.16 1247 1322 0 0.36 99.48 23
94 5 0.35 1281 1322 0 0.17
158 4 0.34 1282 1322 0 0.18
222 5 0.15 3099 1322 0 0.38
31 2 0.18 1231 1320 0 0.22 99.59 22
95 2 0.17 1246 1319 0 0.24
159 4 0.27 1265 1319 0 0.13
223 4 0.28 1266 1319 0 0.13
32 2 0.18 1249 1319 0 0.35 99.48 22
96 4 0.34 1281 1319 0 0.18
160 4 0.34 1282 1319 0 0.18
224 5 0.35 1281 1319 0 0.17
33 5 0.40 1269 1319 0 0.13 99.47 21
97 5 0.39 1275 1319 0 0.14
161 5 0.39 1275 1319 0 0.14
225 5 0.39 1274 1319 0 0.14
34 2 0.17 1249 1320 0 0.33 99.50 21
98 5 0.36 1284 1320 0 0.14
162 5 0.36 1281 1320 0 0.15
226 5 0.36 1283 1320 0 0.15
35 5 0.38 1268 1320 0 0.11 99.51 21
99 2 0.17 1241 1320 0 0.32
163 5 0.37 1275 1320 0 0.13
227 5 0.37 1273 1320 0 0.13
36 5 0.42 1280 1317 0 0.14 99.44 21
100 5 0.39 1284 1317 0 0.17
164 5 0.39 1285 1317 0 0.17
228 5 0.38 1284 1317 0 0.18
37 4 0.30 1256 1317 0 0.19 99.50 21
101 2 0.16 1237 1316 0 0.25
165 4 0.28 1265 1316 0 0.13
229 4 0.28 1265 1316 0 0.13
38 5 0.41 1279 1316 0 0.14 99.44 21
102 5 0.39 1285 1316 0 0.16
166 5 0.38 1284 1316 0 0.17
230 5 0.40 1263 1316 0 0.16
39 3 0.27 1254 1316 0 0.35 99.39 21
103 2 0.16 1239 1315 0 0.37
167 2 0.17 1246 1315 0 0.35
231 4 0.29 1266 1313 0 0.15
40 2 0.19 1253 1312 0 0.32 99.49 21
104 5 0.35 1283 1312 0 0.15
168 5 0.36 1282 1312 0 0.15
232 5 0.35 1281 1312 0 0.16
41 2 0.19 1237 1312 0 0.24 99.57 21
105 2 0.17 1246 1312 0 0.26
169 4 0.28 1262 1312 0 0.15
233 3 0.27 1267 1312 0 0.16
42 2 0.17 1244 1311 0 0.30 99.53 23
106 2 0.14 1308 1311 0 0.33
170 3 0.26 1273 1311 0 0.20
234 3 0.27 1274 1311 0 0.19
43 5 0.38 1269 1311 0 0.13 99.49 23
107 2 0.15 1240 1311 0 0.35
171 5 0.37 1275 1311 0 0.13
235 5 0.37 1277 1311 0 0.14
44 2 0.19 1255 1311 0 0.34 99.47 21
108 5 0.35 1282 1311 0 0.15
172 5 0.36 1281 1311 0 0.15
236 5 0.35 1283 1311 0 0.16
45 4 0.28 1257 1311 0 0.21 99.51 21
109 2 0.17 1242 1310 0 0.23
173 4 0.28 1268 1310 0 0.12
237 4 0.29 1268 1310 0 0.11
46 2 0.17 1244 1310 0 0.33 99.51 21
110 5 0.35 1281 1310 0 0.14
174 4 0.35 1280 1310 0 0.15
238 4 0.34 1282 1310 0 0.15
47 2 0.17 1227 1310 0 0.28 99.55 21
111 2 0.16 1242 1310 0 0.28
175 4 0.28 1267 1310 0 0.16
239 3 0.26 1266 1310 0 0.18
48 3 0.23 1262 1310 0 0.33 99.44 23
112 4 0.34 1278 1310 0 0.22
176 4 0.33 1281 1310 0 0.23
240 4 0.34 1281 1310 0 0.22
49 5 0.36 1263 1307 0 0.13 99.51 23
113 2 0.16 1238 1307 0 0.33
177 5 0.35 1272 1307 0 0.14
241 4 0.35 1272 1307 0 0.14
50 4 0.29 1259 1307 0 0.33 99.38 20
114 5 0.36 1278 1307 0 0.25
178 5 0.36 1278 1307 0 0.25
242 5 0.36 1275 1307 0 0.25
51 5 0.40 1254 1307 0 0.13 99.47 20
115 2 0.19 1225 1307 0 0.34
179 5 0.39 1264 1307 0 0.14
243 5 0.38 1264 1307 0 0.14
52 2 0.20 1240 1307 0 0.54 99.26 24
116 7 0.58 1285 1307 0 0.16
180 5 0.37 1277 1307 0 0.37
244 5 0.37 1278 1307 0 0.37
53 3 0.21 1221 1307 0 0.31 99.49 24
117 5 0.39 1270 1307 0 0.13
181 5 0.39 1271 1307 0 0.12
245 5 0.19 2625 1307 0 0.32
54 2 0.19 1246 1304 0 0.27 99.54 21
118 4 0.33 1276 1305 0 0.14
182 4 0.29 1277 1305 0 0.17
246 4 0.33 1277 1305 0 0.14
55 3 0.26 1248 1305 0 0.28 99.46 22
119 3 0.27 1259 1304 0 0.20
183 4 0.33 1269 1304 0 0.13
247 4 0.32 1267 1304 0 0.14
56 3 0.23 1252 1304 0 0.32 99.45 24
120 4 0.29 1273 1304 0 0.26
184 4 0.28 1271 1304 0 0.27
248 2 0.20 1260 1304 0 0.35
57 2 0.20 1217 1304 0 0.27 99.53 24
121 2 0.20 1236 1303 0 0.27
185 4 0.31 1256 1303 0 0.16
249 2 0.18 1230 1302 0 0.16
58 2 0.17 1237 1302 0 0.24 99.59 22
122 2 0.15 1250 1302 0 0.26
186 3 0.26 1273 1302 0 0.14
250 3 0.25 1274 1302 0 0.15
59 4 0.32 1255 1302 0 0.17 99.52 22
123 2 0.17 1241 1302 0 0.32
187 4 0.33 1268 1302 0 0.16
251 4 0.34 1270 1302 0 0.15
60 3 0.21 1260 1302 0 0.31 99.48 22
124 4 0.32 1276 1302 0 0.19
188 4 0.29 1272 1302 0 0.23
252 4 0.32 1273 1302 0 0.20
61 2 0.18 1211 1302 0 0.25 99.57 22
125 2 0.17 1231 1302 0 0.26
189 4 0.28 1259 1302 0 0.15
253 4 0.28 1254 1302 0 0.15
62 3 0.21 1240 1302 0 0.31 99.47 23
126 4 0.35 1275 1302 0 0.18
190 4 0.35 1275 1302 0 0.18
254 4 0.35 1274 1302 0 0.17
63 3 0.26 1223 1302 0 0.33 99.41 23
127 2 0.17 1226 1302 0 0.41
191 4 0.31 1257 1302 0 0.27
255 6 0.33 1856 1302 0 0.26
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have not looked at the source code for the "turbostat" utility, but most of the columns are straightforward.
The %Busy is usually computed by looking at the elapsed "Reference Cycles Not Halted" counter divided by the elapsed TSC cycles.
Adding in the "Actual Cycles Not Halted" counter enables the computation of the average frequency while not halted.
SMI is "System Management Interrupts". This is the mechanism that a BIOS uses to monitor a processor, but many machines don't use it after the processor is booted.
CPU%c1 is the fraction of time that this core is in the C1 idle state. There is some discussion of these states in Volume 3 of the Intel Architecture Software Developer's Manual (document 325384), particularly in Section 8.10 "Management of Block and Idle Conditions", and in Chapter 14 "Power and Thermal Management" (especially Section 14.6).
CPU%c6 is analogous to CPU%c1, but refers to a "deeper" idle state -- lower power consumption, but longer wake-up latency.
Pkg%pc3 and Pkg%pc6 report the amount of time in two of the "Package C-states". These are reduced-power states that can only be entered when no cores are active.
PkgWatt and RAMWatt come from the RAPL system, described in Section 14.9 of Volume 3 of the Software Developer's Manual.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
Thank you.
Regarding RAPL: PAPI has powercap feature and few paper have done validation of RAPL data with that of PAPI's powercap. Are these two system, collecting data from same register?
If yes, then what's the novelty by using PAPI's powercap, if RAPL can give same data?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
McCalpin, John wrote:
- Per Core Utilization -- yes
- It should be possible to measure this in three different ways, but the third one looks broken on Xeon Phi x200.
- All three require that you read the Time-Stamp Counter at the beginning and end of the measurement interval.
- In addition to the TSC, you need to read any of these three registers at the beginning and end of the measurement interval:
- Fixed-function counter 2 (IA32_FIXED_CTR2, MSR 30Bh).
- One of the programmable counters measuring the CPU_CLK_UNHALTED.REF (Event 3Ch, Umask 01h).
- The IA32_MPERF MSR (E7h).
After looking into turbostat code [1] it seems it uses different MSR for utilization. Following is the snippet of utilization code. I am also not sure by only c1, c3, c6 and c7 C-state are present. I thought c0 is the default where core run as the maximum resources possible. Any suggestions?
Can you also please point me to any other reference code that reads core utilization using MSR?
[1] Turbostat source: https://github.com/torvalds/linux/blob/master/tools/power/x86/turbostat/turbostat.c
if (DO_BIC(BIC_CPU_c1) && use_c1_residency_msr) { if (get_msr(cpu, MSR_CORE_C1_RES, &t->c1)) return -6; } if (DO_BIC(BIC_CPU_c3) && !do_slm_cstates && !do_knl_cstates) { if (get_msr(cpu, MSR_CORE_C3_RESIDENCY, &c->c3)) return -6; } if (DO_BIC(BIC_CPU_c6) && !do_knl_cstates) { if (get_msr(cpu, MSR_CORE_C6_RESIDENCY, &c->c6)) return -7; } else if (do_knl_cstates) { if (get_msr(cpu, MSR_KNL_CORE_C6_RESIDENCY, &c->c6)) return -7; } if (DO_BIC(BIC_CPU_c7)) if (get_msr(cpu, MSR_CORE_C7_RESIDENCY, &c->c7)) return -8;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code quoted above is for splitting up the idle (non-C0) time into different categories. The definition of "utilization" that I have been assuming is C0 (active) time / wall time -- without looking at the details of the various idle categories.
I don't know if there is any guarantee that the MSR_CORE_C*_RESIDENCY registers capture *all* of the cycles that are not in C0, so it seems safer to measure C0 time by "reference cycles not halted" rather than by subtracting the sum of the other C-states from the elapsed time.
The different C-states are primarily important if you are looking at power consumption on systems that are mostly idle, or if you are concerned about core "wake-up" times -- higher-numbered C-states use less power, but take longer to "wake up" (shift to C0/Active).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
McCalpin, John wrote:
The code quoted above is for splitting up the idle (non-C0) time into different categories. The definition of "utilization" that I have been assuming is C0 (active) time / wall time -- without looking at the details of the various idle categories.
Yes. This is also what I am looking for: C0 when CPU is fully utilised. If I am correct, then ARM architectures also use the same formula as you descriced above when the ondemand governor is running. I am stuck in the process of how to capture this C0 state utilization correctly.
May be this is where turbostat is capturing this C0 as %Busy: https://github.com/torvalds/linux/blob/master/tools/power/x86/turbostat/turbostat.c#L834
if (DO_BIC(BIC_Busy)) outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * t->mperf/tsc);
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The "mperf/tsc" term is the same as my "third option" in note #2 above. This should be a correct measure of the fraction of the time that the core is active (C0), but the mperf values on my KNL processors are about 1000x smaller than I expect them to be, so I don't use this.
More precisely, on my KNL processors, the ratio of APERF to MPERF appears correct for computing the average frequency, but MPERF increments at 1/1024th of the rate of the TSC and APERF increments at 1/1024th of the rate of the CPU_CYCLES_UNHALTED.CORE counter.
Given this discrepancy, computing "fraction of time active" as "delta mperf / delta tsc" will be completely wrong -- with a maximum value of 1/1024 if the processor is actually active 100% of the time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
If the mperf values are smaller than expected, then I am not sure why based on my analysis here, the values for %Busy are expected based on the type of workload I am running. I also validate it based on how I am mapping workload by varying the number of threads using KMP_HW_SUBSET environment variable.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Turbostat includes a magic factor of 1024 for the MPERF and APERF counts in KNL. Look for "get_aperf_mperf_multiplier" in the turbostat source code (line 3822). That code returns "1" unless the processor is a KNL, in which case it returns 1024, and all MPERF and APERF deltas are multiplied by this value -- e.g., lines 1585-1586.
Intel's documentation makes it clear that MPERF increments at a rate *proportional to* the TSC, and that the constant of proportionality is neither guaranteed nor meaningful.
I have no idea where the Linux folks found documentation of the factor of 1024 -- I have a lot of KNL documents and I don't see it anywhere obvious in these....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
McCalpin, John wrote:
I have no idea where the Linux folks found documentation of the factor of 1024 -- I have a lot of KNL documents and I don't see it anywhere obvious in these....
That's because the code is written and maintained by Intel guy and for sure can get accurate architecture details internally, than provided by the KNL documents in public domain. Hence, I am considering this code as a reliable reference.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
Can I not log per core CPU utilization similar to how EDC and MC counters are done using perf, I shared about in different threads on this forum?
I will log basic counters and do post processing to get per core utilization. On Volume 1 Intel PMU for Xeon Phi, I haven't got counters related to MPERF and APERF yet.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
APERF and MPERF are not considered "performance monitoring" facilities, so they are not documented in the PMU guide for Xeon Phi. They are documented along with the other MSRs in Volume 4 of the Intel Architectures SW Developer's Manual (document 335592). The entry for Xeon Phi makes no comment about the unusual scaling factor, but if you only compute ratios of APERF to MPERF that factor disappears.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i wanted to check a couple of related questions
On my 7120 Xeon Phi, I've looked at values in
/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj
I noted two things:
(a) both files have monotonously increasing values (until integer roof then reset) IRRESPECTIVE of work load - this is not what I would expect
(b) that I don't know what they are measuring. I had thought socket and maybe "just the cores but nothing more" but giving the integer wraparound it's not clear whether either is higher than the other
advice welcome.
i do not have 'root'/'sudo' in order to try other metrics
yours, michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
example output of
#!/bin/bash +x
echo 'node info'
grep 'model name' /proc/cpuinfo | sort |uniq -c
POWERCAPDIR=/sys/class/powercap
FILE=energy_ujSOCKET=${POWERCAPDIR}/intel-rapl/intel-rapl\:0/${FILE}
SUBSOCK=${POWERCAPDIR}/intel-rapl/intel-rapl\:0/intel-rapl\:0\:0/${FILE}ls -l ${SOCKET}; cat ${SOCKET}
ls -l ${SUBSOCK}; cat ${SUBSOCK}SEC=10
echo reporting energy consumed to date for next $SEC seconds
for k in `seq 1 $SEC`; do
D=`date +%s.%N`
SOCK=`cat $SOCKET`
SUB=`cat $SUBSOCK`
echo $D $SOCK $SUB
sleep 1
done
is:
node info
256 model name : Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
-rw-r--r-- 1 root root 4096 Jul 24 14:56 /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
98731464531
-rw-r--r-- 1 root root 4096 Jul 24 14:56 /sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj
35078540020
reporting energy consumed to date for next 10 seconds
1532440709.313969521 98733719409 35078650562
1532440710.333325570 98815099449 35083995372
1532440711.352328266 98896390074 35089267064
1532440712.370166839 98977634250 35094539153
1532440713.388462036 99058875070 35099809345
1532440714.406850755 99140118331 35105081205
1532440715.425177592 99221438619 35110366468
1532440716.443695284 99302658382 35115625659
1532440717.462236434 99383915925 35120916139
1532440718.480648183 99465147163 35126175973
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page