The speed of the tsc is dependant on CPU work load: I changed the Sleep(1000) in the loop to keep CPU usage at 100% (I used an AES256 encryption function, not that this should matter). The left column is the rdtsc value (hex). The right value is the difference (in decimal). The long term rate of increase is constant, and seems valid. An error of ~0x1B300000 seems to appear and disappear over time.
I had a few more exchanges with or Intel Atom Processor core performance team. They are treating this as a possible hardware sighting and are tracking it and trying to reproduce it.
Could you try and providea bit more input as to
which exact CPU power state the sleep setting on your system relatesto? Would this be C4? Is it possible for you to provide us with the exact CPU chip ID. Which microcode version or updates and BIOS version are running on the system?
We were not able to reproduce your sighting on a standard Z510 with a gcc compiled binary running for several hours (with and without the 1 second delay).
Knowing the exact nature of the power mode switch and the underlying hardware details may be key.
I know you provided us with the basic code snippet to reproduce - since it doesn't show up on our verification systems - do you think you could provide us with the binary you use for testing?
Hi Robert, I use the BIOS provided by Dell. They provided this file to install it: TigerA03.exe . This is the first page of the BIOS Utility (F2 when booting):
Dell Inc. Phoenix SecureCore Setup utility BIOS Version: A03 CPU Type: Intel Atom CPU Z520 CPU Speed: 1330 MHz CPU Cache Size: 512 KB CPU ID: 106C2 Product Name: Inspiron 1010
I'm not sure how to change the power mode. In the BIOS, there is a setting called "IntelSpeedStep Techonology". I tried both Enabled and Disabled modes and I got the same behavior. If this is not what you meant, please direct me to the appropriate place.
When I run the cpuid instruction with parameter 1, I get: eax = 0x00106C2 ebx = 0x0020800
Unfortunately, I only have one Inspiron Mini 1010 here, so I can't check if this is a manufacturing defect or a design flaw. I've ordered two more, but the delivery is scheduled for the second week of June. These newer Inspiron 1011 come with a different processor: the Atom N270 1.6 GHz.
Hi Robert, I contacted Dell, and they're going to "escalate" this forum thread to their BIOS group. They can't give me a timeline for the update, so I'll tag this thread as resolved, with many thanks to you. When (if?) the BIOS group delivers a new version, I'll tell you if this fixed my problem.
My Dell BIOS does not support disabling hyper threading for some reason, so I could not try your suggestion directly. However, using the APIC field from the cpuid instruction, I was able to see what is going on. The third value listed is the ebx of cpuid(1) instruction (which is run right before rdtsc). It's now obvious that Windows is scheduling the thread on two CPUs (for some reason, it does this after some boots and not after others).
The tsc counters of both hyper-threaded CPUs are increasing in lock step. However, the tsc of CPU 1 is starting off with a delay of ~461552170 clocks. This behavior does not occur on Pentium 4 processors with hyper-threading, making profiling work even with hyper-threading enabled and Windows scheduling the threads on different processors over time. I looked at the cycle difference between the processors on the Pentium 4 and it is very, very small. Perhaps there is only one counter(?).
There are several workarounds that I can use to get around this problem with the Atom.
The easiest is to restrict Windows to a single processor. This can be done from the msconfig application, or manually in the boot.ini. This will reduce performance somewhat. I tried it, and this fixes the problem, as one would expect.
I could also measure the difference between the two processors' timestamp (it seems to be constant after boot), and apply the constant correction of ~461552170 clocks to CPU 1 (using cpuid). There are a few serialization issues here which are not fun, but they can be addressed either in a statistical fashion (ie. checking cpuid before and after rdtsc) or with proper serialization in a driver.
If solution 2 could be implemented in the rdtsc microcode, this would be the simplest solution for customers, as all serialization issues would be solved, and programs that work on Pentium 4 HT would work on Atom HT without modification.
this really is great news. Is it possible for you to use the C funstion clock instead of rdtsc? I think that should do almost exactly what you want assuming its not too slow.
Well do some more poking around here to see if this is a hardware issue or not. However, please note we believe right that is allowed for one thread to write the TSC (by using wrmsr) and hence it is allowed for the two TSCs to be offset from each other. The cpu may not be able to prevent that. We'll check some more.