- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On Xeon or Hyper-Threading tech, what isthe cycle time for rep; nop or _asm pause? The documentation hints that it can be anywhere from a nop to some definite value. So what is it?
Thanks in advance
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are forwarding your question to our engineering contacts and will let you know how they respond.
Regards,
Message Edited by intel.software.network.support on 12-02-2005 01:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is the response we received from our Application Engineers:
NOP instruction can be between 0.4-0.5 clocks and PAUSE instruction can consume 38-40 clocks. Please refer to the whitepaper on how to measure the latency and throughput of various instructions. The REPE instruction comes in various flavors and the latency/throughput of each of them varies. Please also see below for the sample code to measure the average clocks.
#include
#define ReadTSC( x ) __asm cpuid
__asm rdtsc
__asm mov dword ptr x,eax
__asm mov dword ptr x+4,edx
#define LOOP_COUNT 160000.
#define REPEAT_25( x ) x x x x x x x x x x x x x x x x x x x x x x x x x
#define REPEAT_100(x) REPEAT_25(x) REPEAT_25(x) REPEAT_25(x) REPEAT_25(x)
#define REPEAT_1000(x) REPEAT_100(x) REPEAT_100(x) REPEAT_100(x) REPEAT_100(x) REPEAT_100(x) REPEAT_100(x) REPEAT_100(x) REPEAT_100(x) REPEAT_100(x) REPEAT_100(x)
#define FACTOR ((double)LOOP_COUNT*1000.0)
#define CLOCKSPERINSTRUCTION(start,end) ((double)end-(double)start)/(FACTOR)
void main(int argc,char **argv)
{
__int64 start, end,total;
total = 0;;
ReadTSC(start);
for (int i=0; i
{
REPEAT_1000(__asm { nop};)
}
ReadTSC(end);
total = end-start;
printf("nop: clocks per instruction %4.2f
",(double)total/(double)FACTOR);
ReadTSC(start);
total = 0;;
for (int i=0; i
{
REPEAT_1000(__asm { pause};)
}
ReadTSC(end);
total = end-start;
printf("pause: clocks per instruction %4.2f
",(double)total/(double)FACTOR);
}
==
Regards,
Message Edited by intel.software.network.support on 12-02-2005 01:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. The documentation states the results vary as does the answer. I'll try this on our various 8x, 16x and32x machines and see what the variance is.
Thanks again.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page