Hi,
Does anyone know if the memory load latency information using Precision Event Based Sampling (PEBS) on Haswell is affected when DVFS or any other power control is used? What I want more specifically is to know which frequency PEBS uses to calculate latency? Is it TSC frequency, ratio of APERF and MPERF or anything else?
The TSC frequency I understand is not affected by DVFS or any other power control, while ratio of APERF and MPERF is affected.
Thanks,
Sridutt
Link Copied
Which specific PMC events are you referring to when you say "memory load latency information"?
About APERF and MPERF, at least MPERF should not be affected. It ticks at TSC frequency for most chips, with the exceptions being Skylake and beyond (where it ticks at "nominal CPU frequency", which is subtly different than TSC frequency on those chips), and on on of the Phi-type chips where it counts only every 1,000 clocks or so.
APERF, of course, counts at the actual CPU frequency.
Travis D. wrote:
Which specific PMC events are you referring to when you say "memory load latency information"?
About APERF and MPERF, at least MPERF should not be affected. It ticks at TSC frequency for most chips, with the exceptions being Skylake and beyond (where it ticks at "nominal CPU frequency", which is subtly different than TSC frequency on those chips), and on on of the Phi-type chips where it counts only every 1,000 clocks or so.
APERF, of course, counts at the actual CPU frequency.
When I meant ratio of APERF by MPERF, I was suggesting that APERF changes making the ratio to change.
By "Memory Load Latency Information", I refer to 18.8.1.2 Load Latency Performance Monitoring Facility in Intel Software Developer Manual Vol 3B Page 18-40 (September 2016). It characterizes the average load latency to different levels of cache/memory hierarchy.
I wanted to know if the number of accesses that took say 'x' cycles to complete increase or decrease when the frequency is changed using DVFS or Dynamic Duty Cycle Modulation (T-states), is it because of the change in frequency (as the cycle length increases) or due improvement/worsening of cache/memory access behavior (hit-rate, latency etc.).
Section 18.8.1.2 says that the Load Latency Performance Monitoring Facility counts in core cycles. To convert to seconds, you will need to know what the frequency was at the time that the load occurred. This may be inconvenient.
On the plus side, it is extremely unlikely that the processor could allow the frequency to change *during* the execution of a load. The first paragraph of Section 6.6 of Volume 3 of the Intel Architecture SW Developer's Manual notes:
"All interrupts are guaranteed to be taken on an instruction boundary."
If the core stall required for a frequency change is implemented by the same mechanism that is used for interrupts, then this is enough to ensure that the frequency cannot change while the load is executing -- the interrupt associated with the frequency change must occur either before or after the load.
Some caveats, of course....
Some more notes:
For more complete information about compiler optimizations, see our Optimization Notice.