Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Synchronizing Time Stamp Counter

Roman_Oderov
Principiante
5.733 Visualizações
Hello everyone! I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors relative to one of them. I tried rdtsc() but it returned TSC of the current processor. Is there any way to get TSC from the necessary processor? Or may be I can define processor id somewhere and use an appropriate time stamp counter value. Thanks in advance, Roman
0 Kudos
76 Respostas
Roman_D_Intel
Funcionário
1.073 Visualizações
>>>It's not at all clear how QueryPerformanceCounter is implemented,>>> QueryPerformanceCounter could be disassembled and statically or dynamically analyzed in order to understand its implementation.I suppose that this functions could use HPET timer.
You can call QueryPerformanceFrequency (returns counts per second) to have an idea if it is implemented with TSC or HPET. Thanks, Roman
Roman_D_Intel
Funcionário
1.073 Visualizações
Hi,
But, if I got right, the invariant TSC in newer processors (17.13.1 in Vol.3) guarantees me TSC values been synchronized. Well, in older processors I can't still rely on TSC's of different cores without manual synchronization. Am I right?
not quite... The time-stamp counter on recent Intel processors is reset to zero each time the processor package has RESET asserted. From that point onwards the invariant TSC will continue to tick constantly across frequency changes, turbo mode and ACPI C-states. All parts that see RESET synchronously will have their TSC's completely synchronized. This synchronous distribution of RESET is required for all sockets connected to a single PCH. For multi-node systems RESET might not be synchronous. The biggest issue with TSC synchronization across multiple threads/cores/packages is the ability for software to write the TSC. The TSC is exposed as MSR 0x10. Software is able to use WRMSR 0x10 to set the TSC. However, as the TSC continues as a moving target, writing it is not guaranteed to be precise. For example a SMI (System Management Interrupt) could interrupt the software flow that is attempting to write the time-stamp counter immediately prior to the WRMSR. This could mean the value written to the TSC could vary by thousands to millions of clocks. hope this helps, Roman
Bernard
Contribuidor valorado I
1.073 Visualizações
>>>You can call QueryPerformanceFrequency (returns counts per second) to have an idea if it is implemented with TSC or HPET.>>> Thank you Roman.
SergeyKostrov
Contribuidor valorado II
1.073 Visualizações
>>...If you are interested you can test HT scaling... This is what I'm going to do some time later.
SergeyKostrov
Contribuidor valorado II
1.073 Visualizações
>>...For multi-node systems RESET might not be synchronous... I wonder how VTune gets times on a multi-node system? Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?
Roman_D_Intel
Funcionário
1.073 Visualizações
Hi Sergey,
>>...For multi-node systems RESET might not be synchronous... I wonder how VTune gets times on a multi-node system? Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?
I am not VTune developer, but could you please elaborate why are you concerned? Which type of VTune analysis should not work if TSC has a small delta between the sockets? We are probably talking about deltas that are comparable with the delay of just a few remote memory accesses to other socket. Roman
SergeyKostrov
Contribuidor valorado II
1.073 Visualizações
>>...I am not VTune developer, but could you please elaborate why are you concerned? I don't have any concerns and I simply would like to know how VTune gets times. Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?
Roman_D_Intel
Funcionário
1.073 Visualizações
Sergey, I recommend you to repost your question with the reference to this thread to the Intel VTune forum which is tracked by VTune developers. Thanks, Roman
Bernard
Contribuidor valorado I
1.073 Visualizações
>>>This is what I'm going to do some time later.>>> It would be great to see the results. I bet that for heavy floating point load scaling won't give any advantage.Some speedup probably will be due to lack of interdependencies beetwen various instruction beign dispatched to various ports.
Bernard
Contribuidor valorado I
1.073 Visualizações
>>>recommend you to repost your question with the reference to this thread to the>>> Do you think that Intel developers will reveal exact implementation of the VTune timers.
SergeyKostrov
Contribuidor valorado II
1.073 Visualizações
>>>>recommend you to repost your question with the reference to this thread to the... >> >>Do you think that Intel developers will reveal exact implementation of the VTune timers. Actually, I don't details and I simply need Yes or No answer, like 'Yes, RDTSC used' or 'No, RDTSC Not used'... Here is a link to my question on the VTune forum: Forum topic: Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction? Web-link: http://software.intel.com/en-us/forums/topic/335541
Bernard
Contribuidor valorado I
1.073 Visualizações
Actually, I don't details and I simply need Yes or No answer, like 'Yes, RDTSC used' or 'No, RDTSC Not used'... Here is a link to my question on the VTune forum: I asked this because a few weeks ago I posted a question on MKL forum and asked about exact algorithm used to approximate Gamma on problematic range [0,001,1.0] and one of Intel employees refused to reveal an algorithmic implementation.
SergeyKostrov
Contribuidor valorado II
1.073 Visualizações
>>...asked about exact algorithm used to approximate Gamma on problematic range [0,001,1.0] and one of Intel employees refused to >>reveal an algorithmic implementation. I'm not surprised to hear that. In many cases like yours things are working only in one direction, that is, for the benefit of a corporation. Iliya, try to ask Microsoft to release some sources and you won't get a response at all.
Bernard
Contribuidor valorado I
1.073 Visualizações
>>>I'm not surprised to hear that. In many cases like yours things are working only in one direction, that is, for the benefit of a corporation. Iliya, try to ask Microsoft to release some sources and you won't get a response at all.>>> Yes that's true.Sometimes little bit of reversing is the only solution albeit not the simplest and fastest one:)
SergeyKostrov
Contribuidor valorado II
1.073 Visualizações
Sorry, it is off the topic... >>...asked about exact algorithm used to approximate Gamma on problematic range [0,001,1.0] and one of Intel employees refused to >>reveal an algorithmic implementation. Could you try to ask the same question on a GNU Scientific Library ( GSL ) forum?
Bernard
Contribuidor valorado I
1.073 Visualizações
>>>Could you try to ask the same question on a GNU Scientific Library ( GSL ) forum?>>> Good question.I will ask this on their forum.GSL source code and implementation is open source so they will probably came with an exact answer. Btw I solved this problem with the help of Mathematica 8 minimax polynomial calculation. @Sergey Can I freely use my own wrappers based on MKL library?
SergeyKostrov
Contribuidor valorado II
1.073 Visualizações
>>Can I freely use my own wrappers based on MKL library? You need to review MKL's license regarding what you can do and what you can't with the library.
SergeyKostrov
Contribuidor valorado II
1.073 Visualizações
[ To Roman Oderov ] Any updates? Performance results?
Roman_Oderov
Principiante
1.105 Visualizações
[ to Sergey] Hi! I hadn't a lot of time, but here're some results: p.s. I haven't changed your code deliberately (except some trivial modifications)
Roman_Oderov
Principiante
1.105 Visualizações
In addition I can submit for consideration a .log-file, where 50 consequtive program starts are logged.
SergeyKostrov
Contribuidor valorado II
1.105 Visualizações
Hi Roman, Thanks and I'll take a look at your results.
Responder