- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interesting results.
Agner Fog's manuals provide different result for RDTSC throughput a bit higher than your results of latency.
Unfortunately he did not provide any data about potential CPU clock consumption of RDTSC latency.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why do not you serialize uop of RDTSC execution?
Afaik RDTSC is not serializing instruction so in theory multiple of them can be executed at the same time and at least partially overlap pipelined execution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am posting here RDTSC reciprocal throughput result as stated by Agner Fog.
CPU Arch: Ivy Bridge , RDTSC Reciprocal Throughput: 27 CPU clock cycles.
Reference p. 175
http://www.agner.org/optimize/instruction_tables.pdf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>That is why I tried to fill a CPU pipeline with at least 10 RDTSC or RDTSCP instructions.>>>
I am still puzzled by at least some probable (Hardware level) pipelined execution of those 10 micro-ops. I will try to find some information at Google patents which may shed some light on proposed (patented) implementation of RDTSC instruction.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have found an Intel patent titled "Apparatus for monitoring the performance of a microprocessor" and there is no clear information about pipelined read of TSC.
Link to aforementioned article:
https://patents.google.com/patent/US5657253A/en?q=time+stamp+counter&assignee=Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »