[cpp] virtual void process ()
int32 inner = 0;
int32 result = 0;
srand ( time ( NULL ) );
for ( int32 i = 0; i< npoints; i++ )
double x = (((float) rand()) / RAND_MAX * 2) - 1;
double y = (((float) rand()) / RAND_MAX * 2) - 1;
if ( sqrt (x*x + y*y) <= 1 )
(result) += inner;
This snippet runs simultaniously on 4 cores with disabled hyper-threading. When profiling with VTune I'm measuring a CPI-value of 0.76 with a CPupos value of 0.80. The CPI is measured with the 'build-in' ratio, the CPuops-value is measured with a self-defined ratio ([pmn:CPU_CLK_UNHALTED.THREAD]/[pmn:UOPS_RETIRED.ANY] ). The ratios for the other benchmarks are measured as expected so I think there is no misconfiguration but since the cpu decodes an instruction to at least 1 micro-operation this result should be impossible. Any ideas?
thanks in advance,
There are some cases, where two instructions are translated to only 1 op, for example several combinations of test or compare together with a conditional jump. This feature is called "macro-fusion" and could explain your observation.