- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm currently profiling the behavior of a core i7 cpu with some benchmarks taken from the technology journal article 'Hyper-Threading Technology: Impact on Compute-Intensive Workloads'. One of the Benchmarks generates a result wich is hard to explain for me.
This snippet runs simultaniously on 4 cores with disabled hyper-threading. When profiling with VTune I'm measuring a CPI-value of 0.76 with a CPupos value of 0.80. The CPI is measured with the 'build-in' ratio, the CPuops-value is measured with a self-defined ratio ([pmn:CPU_CLK_UNHALTED.THREAD]/[pmn:UOPS_RETIRED.ANY] ). The ratios for the other benchmarks are measured as expected so I think there is no misconfiguration but since the cpu decodes an instruction to at least 1 micro-operation this result should be impossible. Any ideas?
thanks in advance,
Lars
[cpp] virtual void process ()
{
int32 inner = 0;
int32 result = 0;
srand ( time ( NULL ) );
for ( int32 i = 0; i< npoints; i++ )
{
double x = (((float) rand()) / RAND_MAX * 2) - 1;
double y = (((float) rand()) / RAND_MAX * 2) - 1;
if ( sqrt (x*x + y*y) <= 1 )
{
inner++;
}
}
(result) += inner;
}[/cpp]
This snippet runs simultaniously on 4 cores with disabled hyper-threading. When profiling with VTune I'm measuring a CPI-value of 0.76 with a CPupos value of 0.80. The CPI is measured with the 'build-in' ratio, the CPuops-value is measured with a self-defined ratio ([pmn:CPU_CLK_UNHALTED.THREAD]/[pmn:UOPS_RETIRED.ANY] ). The ratios for the other benchmarks are measured as expected so I think there is no misconfiguration but since the cpu decodes an instruction to at least 1 micro-operation this result should be impossible. Any ideas?
thanks in advance,
Lars
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - lkleen
since the cpu decodes an instruction to at least 1 micro-operation this result should be impossible. Any ideas?
Lars,
There are some cases, where two instructions are translated to only 1 op, for example several combinations of test or compare together with a conditional jump. This feature is called "macro-fusion" and could explain your observation.
Kind regards
Thomas
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page