I have a simple program running on a linux server that measures the delta between 2 consecutive rdtscl calls. On most of my intel servers, it returns somewhere around 66-104 depending on the clock speed. For instance, on a 2Ghz server, it returns 66 clock cycles. On a 3Ghz, it returns about 104 clock cycles. This makes sense to me. But when I run this same program on an AMD server with comparable hardware and clock speed, I get 6 clock cycles. This doesn't make sense to me. Is this due to just the architecture differences between intel and amd cpu's?
Certain AMD CPUs have reduced the overhead of rdtsc to values as low as you mention. The first Intel CPUs with comparable overhead (in terms of FSB clock ticks) were in the Penryn family, thus relatively recent.