Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4995 Discussions

more cycles per micro-instruction than cycles per instruction

lkleen
Beginner
198 Views
I'm currently profiling the behavior of a core i7 cpu with some benchmarks taken from the technology journal article 'Hyper-Threading Technology: Impact on Compute-Intensive Workloads'. One of the Benchmarks generates a result wich is hard to explain for me.

[cpp]	virtual void process ()
{

int32 inner = 0;
int32 result = 0;

srand ( time ( NULL ) );

for ( int32 i = 0; i< npoints; i++ )
{
double x = (((float) rand()) / RAND_MAX * 2) - 1;
double y = (((float) rand()) / RAND_MAX * 2) - 1;

if ( sqrt (x*x + y*y) <= 1 )
{
inner++;
}
}

(result) += inner;

}[/cpp]

This snippet runs simultaniously on 4 cores with disabled hyper-threading. When profiling with VTune I'm measuring a CPI-value of 0.76 with a CPupos value of 0.80. The CPI is measured with the 'build-in' ratio, the CPuops-value is measured with a self-defined ratio ([pmn:CPU_CLK_UNHALTED.THREAD]/[pmn:UOPS_RETIRED.ANY] ). The ratios for the other benchmarks are measured as expected so I think there is no misconfiguration but since the cpu decodes an instruction to at least 1 micro-operation this result should be impossible. Any ideas?

thanks in advance,
Lars
0 Kudos
1 Reply
Thomas_W_Intel
Employee
198 Views
Quoting - lkleen
since the cpu decodes an instruction to at least 1 micro-operation this result should be impossible. Any ideas?

Lars,

There are some cases, where two instructions are translated to only 1 op, for example several combinations of test or compare together with a conditional jump. This feature is called "macro-fusion" and could explain your observation.

Kind regards

Thomas

0 Kudos
Reply