- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hello everyone!
I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors relative to one of them.
I tried rdtsc() but it returned TSC of the current processor. Is there any way to get TSC from the necessary processor? Or may be I can define processor id somewhere and use an appropriate time stamp counter value.
Thanks in advance,
Roman
링크가 복사됨
76 응답
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>>It's not at all clear how QueryPerformanceCounter is implemented,>>> QueryPerformanceCounter could be disassembled and statically or dynamically analyzed in order to understand its implementation.I suppose that this functions could use HPET timer.You can call QueryPerformanceFrequency (returns counts per second) to have an idea if it is implemented with TSC or HPET. Thanks, Roman
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
But, if I got right, the invariant TSC in newer processors (17.13.1 in Vol.3) guarantees me TSC values been synchronized. Well, in older processors I can't still rely on TSC's of different cores without manual synchronization. Am I right?not quite... The time-stamp counter on recent Intel processors is reset to zero each time the processor package has RESET asserted. From that point onwards the invariant TSC will continue to tick constantly across frequency changes, turbo mode and ACPI C-states. All parts that see RESET synchronously will have their TSC's completely synchronized. This synchronous distribution of RESET is required for all sockets connected to a single PCH. For multi-node systems RESET might not be synchronous. The biggest issue with TSC synchronization across multiple threads/cores/packages is the ability for software to write the TSC. The TSC is exposed as MSR 0x10. Software is able to use WRMSR 0x10 to set the TSC. However, as the TSC continues as a moving target, writing it is not guaranteed to be precise. For example a SMI (System Management Interrupt) could interrupt the software flow that is attempting to write the time-stamp counter immediately prior to the WRMSR. This could mean the value written to the TSC could vary by thousands to millions of clocks. hope this helps, Roman
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>...If you are interested you can test HT scaling...
This is what I'm going to do some time later.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>...For multi-node systems RESET might not be synchronous...
I wonder how VTune gets times on a multi-node system?
Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Sergey,
>>...For multi-node systems RESET might not be synchronous... I wonder how VTune gets times on a multi-node system? Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?I am not VTune developer, but could you please elaborate why are you concerned? Which type of VTune analysis should not work if TSC has a small delta between the sockets? We are probably talking about deltas that are comparable with the delay of just a few remote memory accesses to other socket. Roman
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>...I am not VTune developer, but could you please elaborate why are you concerned?
I don't have any concerns and I simply would like to know how VTune gets times. Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Sergey,
I recommend you to repost your question with the reference to this thread to the Intel VTune forum which is tracked by VTune developers.
Thanks,
Roman
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>>This is what I'm going to do some time later.>>>
It would be great to see the results.
I bet that for heavy floating point load scaling won't give any advantage.Some speedup probably will be due to lack of interdependencies beetwen various instruction beign dispatched to various ports.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>>>recommend you to repost your question with the reference to this thread to the...
>>
>>Do you think that Intel developers will reveal exact implementation of the VTune timers.
Actually, I don't details and I simply need Yes or No answer, like 'Yes, RDTSC used' or 'No, RDTSC Not used'... Here is a link to my question on the VTune forum:
Forum topic: Does VTune use 'QueryPerformanceCounter' Win32 API function or 'RDTSC' instruction?
Web-link: http://software.intel.com/en-us/forums/topic/335541
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Actually, I don't details and I simply need Yes or No answer, like 'Yes, RDTSC used' or 'No, RDTSC Not used'... Here is a link to my question on the VTune forum:
I asked this because a few weeks ago I posted a question on MKL forum and asked about exact algorithm used to approximate Gamma on problematic range [0,001,1.0] and one of Intel employees refused to reveal an algorithmic implementation.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>...asked about exact algorithm used to approximate Gamma on problematic range [0,001,1.0] and one of Intel employees refused to
>>reveal an algorithmic implementation.
I'm not surprised to hear that. In many cases like yours things are working only in one direction, that is, for the benefit of a corporation. Iliya, try to ask Microsoft to release some sources and you won't get a response at all.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>>I'm not surprised to hear that. In many cases like yours things are working only in one direction, that is, for the benefit of a corporation. Iliya, try to ask Microsoft to release some sources and you won't get a response at all.>>>
Yes that's true.Sometimes little bit of reversing is the only solution albeit not the simplest and fastest one:)
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Sorry, it is off the topic...
>>...asked about exact algorithm used to approximate Gamma on problematic range [0,001,1.0] and one of Intel employees refused to
>>reveal an algorithmic implementation.
Could you try to ask the same question on a GNU Scientific Library ( GSL ) forum?
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>>Could you try to ask the same question on a GNU Scientific Library ( GSL ) forum?>>>
Good question.I will ask this on their forum.GSL source code and implementation is open source so they will probably came with an exact answer.
Btw I solved this problem with the help of Mathematica 8 minimax polynomial calculation.
@Sergey
Can I freely use my own wrappers based on MKL library?
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>Can I freely use my own wrappers based on MKL library?
You need to review MKL's license regarding what you can do and what you can't with the library.
