Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Synchronizing Time Stamp Counter

Roman_Oderov
Beginner
4,767 Views
Hello everyone! I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors relative to one of them. I tried rdtsc() but it returned TSC of the current processor. Is there any way to get TSC from the necessary processor? Or may be I can define processor id somewhere and use an appropriate time stamp counter value. Thanks in advance, Roman
0 Kudos
76 Replies
Roman_D_Intel
Employee
844 Views
>>>It's not at all clear how QueryPerformanceCounter is implemented,>>> QueryPerformanceCounter could be disassembled and statically or dynamically analyzed in order to understand its implementation.I suppose that this functions could use HPET timer.
You can call QueryPerformanceFrequency (returns counts per second) to have an idea if it is implemented with TSC or HPET. Thanks, Roman
0 Kudos
Roman_D_Intel
Employee
844 Views
Hi,
But, if I got right, the invariant TSC in newer processors (17.13.1 in Vol.3) guarantees me TSC values been synchronized. Well, in older processors I can't still rely on TSC's of different cores without manual synchronization. Am I right?
not quite... The time-stamp counter on recent Intel processors is reset to zero each time the processor package has RESET asserted. From that point onwards the invariant TSC will continue to tick constantly across frequency changes, turbo mode and ACPI C-states. All parts that see RESET synchronously will have their TSC's completely synchronized. This synchronous distribution of RESET is required for all sockets connected to a single PCH. For multi-node systems RESET might not be synchronous. The biggest issue with TSC synchronization across multiple threads/cores/packages is the ability for software to write the TSC. The TSC is exposed as MSR 0x10. Software is able to use WRMSR 0x10 to set the TSC. However, as the TSC continues as a moving target, writing it is not guaranteed to be precise. For example a SMI (System Management Interrupt) could interrupt the software flow that is attempting to write the time-stamp counter immediately prior to the WRMSR. This could mean the value written to the TSC could vary by thousands to millions of clocks. hope this helps, Roman
0 Kudos
Bernard
Valued Contributor I
844 Views
>>>You can call QueryPerformanceFrequency (returns counts per second) to have an idea if it is implemented with TSC or HPET.>>> Thank you Roman.
0 Kudos
SergeyKostrov
Valued Contributor II
844 Views
>>...If you are interested you can test HT scaling... This is what I'm going to do some time later.
0 Kudos
SergeyKostrov
Valued Contributor II
844 Views
>>...For multi-node systems RESET might not be synchronous... I wonder how VTune gets times on a multi-node system? Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?
0 Kudos
Roman_D_Intel
Employee
844 Views
Hi Sergey,
>>...For multi-node systems RESET might not be synchronous... I wonder how VTune gets times on a multi-node system? Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?
I am not VTune developer, but could you please elaborate why are you concerned? Which type of VTune analysis should not work if TSC has a small delta between the sockets? We are probably talking about deltas that are comparable with the delay of just a few remote memory accesses to other socket. Roman
0 Kudos
SergeyKostrov
Valued Contributor II
844 Views
>>...I am not VTune developer, but could you please elaborate why are you concerned? I don't have any concerns and I simply would like to know how VTune gets times. Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?
0 Kudos
Roman_D_Intel
Employee
844 Views
Sergey, I recommend you to repost your question with the reference to this thread to the Intel VTune forum which is tracked by VTune developers. Thanks, Roman
0 Kudos
Bernard
Valued Contributor I
844 Views
>>>This is what I'm going to do some time later.>>> It would be great to see the results. I bet that for heavy floating point load scaling won't give any advantage.Some speedup probably will be due to lack of interdependencies beetwen various instruction beign dispatched to various ports.
0 Kudos
Bernard
Valued Contributor I
844 Views
>>>recommend you to repost your question with the reference to this thread to the>>> Do you think that Intel developers will reveal exact implementation of the VTune timers.
0 Kudos
SergeyKostrov
Valued Contributor II
844 Views
>>>>recommend you to repost your question with the reference to this thread to the... >> >>Do you think that Intel developers will reveal exact implementation of the VTune timers. Actually, I don't details and I simply need Yes or No answer, like 'Yes, RDTSC used' or 'No, RDTSC Not used'... Here is a link to my question on the VTune forum: Forum topic: Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction? Web-link: http://software.intel.com/en-us/forums/topic/335541
0 Kudos
Bernard
Valued Contributor I
844 Views
Actually, I don't details and I simply need Yes or No answer, like 'Yes, RDTSC used' or 'No, RDTSC Not used'... Here is a link to my question on the VTune forum: I asked this because a few weeks ago I posted a question on MKL forum and asked about exact algorithm used to approximate Gamma on problematic range [0,001,1.0] and one of Intel employees refused to reveal an algorithmic implementation.
0 Kudos
SergeyKostrov
Valued Contributor II
844 Views
>>...asked about exact algorithm used to approximate Gamma on problematic range [0,001,1.0] and one of Intel employees refused to >>reveal an algorithmic implementation. I'm not surprised to hear that. In many cases like yours things are working only in one direction, that is, for the benefit of a corporation. Iliya, try to ask Microsoft to release some sources and you won't get a response at all.
0 Kudos
Bernard
Valued Contributor I
844 Views
>>>I'm not surprised to hear that. In many cases like yours things are working only in one direction, that is, for the benefit of a corporation. Iliya, try to ask Microsoft to release some sources and you won't get a response at all.>>> Yes that's true.Sometimes little bit of reversing is the only solution albeit not the simplest and fastest one:)
0 Kudos
SergeyKostrov
Valued Contributor II
844 Views
Sorry, it is off the topic... >>...asked about exact algorithm used to approximate Gamma on problematic range [0,001,1.0] and one of Intel employees refused to >>reveal an algorithmic implementation. Could you try to ask the same question on a GNU Scientific Library ( GSL ) forum?
0 Kudos
Bernard
Valued Contributor I
844 Views
>>>Could you try to ask the same question on a GNU Scientific Library ( GSL ) forum?>>> Good question.I will ask this on their forum.GSL source code and implementation is open source so they will probably came with an exact answer. Btw I solved this problem with the help of Mathematica 8 minimax polynomial calculation. @Sergey Can I freely use my own wrappers based on MKL library?
0 Kudos
SergeyKostrov
Valued Contributor II
844 Views
>>Can I freely use my own wrappers based on MKL library? You need to review MKL's license regarding what you can do and what you can't with the library.
0 Kudos
SergeyKostrov
Valued Contributor II
844 Views
[ To Roman Oderov ] Any updates? Performance results?
0 Kudos
Roman_Oderov
Beginner
876 Views
[ to Sergey] Hi! I hadn't a lot of time, but here're some results: p.s. I haven't changed your code deliberately (except some trivial modifications)
0 Kudos
Roman_Oderov
Beginner
876 Views
In addition I can submit for consideration a .log-file, where 50 consequtive program starts are logged.
0 Kudos
SergeyKostrov
Valued Contributor II
876 Views
Hi Roman, Thanks and I'll take a look at your results.
0 Kudos
Reply