The same subject was discussed here 1.5 years ago:
However, I would like to ask few more questions and since it was a 1+ years old thread so I feel it is better to open a new one.
Here is my questions:
1. Is TSC per core or per CPU? Let's say it an Intel Xeon Gold 6146 Processor.
2. In the post above, John mentioned that there is a "100 MHz reference clock" outside the CPU that will change the frequency of TSC. Is there a document about under what situation the "reference clock" will change the frequency? And how can the change be detected?
3. In the post above, it sounds like TSC is inside the CPU. If the temperature of the CPU increases, it will affect the TSC right because of the physical properties of the material that TSC was made of? Or the technology is so advance now that TSC can operate at a very high temperature at the predefined frequency?
The "system reference clock" signal is delivered to the processor from an external source. For Xeon E5 v1 (Sandy Bridge) through Xeon Scalable (Skylake Xeon) the clock is expected to be at 100 MHz. An example document that discusses this is for the Xeon E5 v3 (Haswell) is at https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-2400-v3-datasheet-vo... Such documents typically expect a lot of familiarity with the material, but Section 3.6 explicitly states that the typical input clock is 100 MHz. For Skylake Xeon, Volume 1 of the processor datasheet (document 336062), also says that the "system reference clock" is expected to be 100 MHz.
The TSC is derived from this external reference clock, but the details of the implementation are left somewhat unclear. In order to maintain the "constant_tsc" property (i.e., the TSC rate does not change with core frequency), the TSC must be maintained outside of the cores. In order to maintain the "nonstop_tsc" property (i.e., the TSC continues to increase even in deep package C states), the TSC must be implemented in a part of the uncore that is not de-clocked or powered down when the package is in deep C states. Examples of plausible locations for such an implementation are the Power Control Unit and the UBox (which manages all interrupts for the chip). (Reference: https://www.centos.org/docs/5/html/5.4/Release_Notes/ar01s07.html). ; When a processor executes the RDTSC instruction, it appears to perform an interpolation based on the most recent global reference clock count plus an adjustment based on the number of local core clocks since the last reference clock count scaled by the ratio of the (invariant) TSC frequency to the current core frequency. Then the processor adds the local TSC offset (IA32_TSC_OFFSET, MSR 0x3b). (I have never seen this used, but it may be important in VM situations?)
I don't see any references to the quality or drift allowed in the external clock in the documents I am looking at now. On the Xeon Platinum 8160 processor that I am logged into, the reported TSC frequency is 2.095 GHz (as determined by the Linux kernel), rather than the expected 2.100 GHz. This is a fairly large difference (0.2%) in some respects, but is clearly irrelevant for performance. Since almost all modern systems use an externally-synchronized wall-clock infrastructure (e.g., NTP or its successors), the specific value of the clock frequency is not particularly important -- it is mostly important that the variation in the clock frequency occur on relatively long time scales.