MKL Summary Statistics computation of the mean of constant data is off

tennican · ‎11-08-2011

If my data is composed of 500 observations of the value 2, MKL returns a mean of 1.9999999999999958.
I would expect it to compute 2.0000000000000000
as does the by hand implementation below executing on the same data in the same place.
--------
double testMean = 0.0;
for ( UINT idx = 0; idx < m_data.size(); idx++ ) {
testMean += m_data[idx];
}
testMean /= m_data.size();

Andrey_N_Intel · ‎11-09-2011

Hi Scott,
MKL Summary Statistics algorithm for computation of mean is based onone-pass/on-line schemeM(n+1) = 1/(n+1) * (n * M(n) + X(n+1)), where X(n) - n-th observation, M(n) - mean estimate for dataset X(1),..., X(n),... Generally, the parameters in this formula are not exactly represented in FP arithmetic (like coefficient 1/(n+1) = 1/3 for n=2),thus, you see the result which differs from the ideal one by ~10^(-15) (what isaligned with double FP error).On the other hand, the intention behind the algorithm - have more stable resultsfor various types of the datasets, not just the the absolutely exact for thissimple dataset.
We also need to pay attention to the following important fact: in some (not all) applications statistical error dominates numerical error; if we estimate the mean fora realistic dataset (which contains non constant "signal"), stat errorwould have theorder ~1/sqrt(500)=~0.044 whichmakesnumerical error 10^(-15) "absolutely negligible".
Does itanswer your question?
Thanks,
Andrey

SergeyKostrov · ‎11-09-2011

Hi Andrey,

Look, here is a modifiedtest for 'float' ( single-precision )and 'double' ( double-precision )data types:

/////////////////////////////////////////////////////////////////////////////////

float fData[500] = { 0.0f };
double dData[500] = { 0.0L };

float fTestMean = 0.0f;
double dTestMean = 0.0L;

UINT idx;

for( idx = 0; idx < 500; idx++ )
{
fData[idx] = 2.0f;
dData[idx] = 2.0L;
}

for( idx = 0; idx < 500; idx++ )
{
fTestMean += fData[idx];
}
fTestMean /= ( float )500;

for( idx = 0; idx < 500; idx++ )
{
dTestMean += dData[idx];
}
dTestMean /= ( double )500;

printf( "Mean for 'Float test-case' = %.32f\n", fTestMean );
printf( "Mean for 'Double test-case' = %.32f\n", dTestMean );

/////////////////////////////////////////////////////////////////////////////////

Mean for 'Float test-case' = 2.00000000000000000000000000000000
Mean for 'Double test-case' = 2.00000000000000000000000000000000

/////////////////////////////////////////////////////////////////////////////////

As you see resultsfrom single-precision ( float \24-bit ) test-case and double-precision ( double \53-bit ) test-case are identical! I'm absolutely not convinced with your explanation:

>>...~10^(-15) (what isaligned with double FP error)...

There are no miracles in my test-casebecause in IEEE 754 standard number 2.0 hasexact binary representations:

For single-precision:

2(Base10) = 0x40000000(Base16) => 0 10000000 00000000000000000000000(Base2\IEEE754)

and

For double-precision:

2(Base10) = 0x4000000000000000(Base16) =>
0 10000000000 0000000000000000000000000000000000000000000000000000(Base2\IEEE754)

Best regards,
Sergey

Andrey_N_Intel · ‎11-09-2011

Hi Sergey,
Thanks for your inputs.
My comments above are related not to the straightforward summation followed by division but to the one-pass scheme for computation of the mean M(n+1) = 1/(n+1) * (n * M(n) + X(n+1)), n=1,...,N(N=500) which is a different algorithm and whichgenerally can return the result slightly different from 2.0 in that particular case; this is what Scott sees with MKL Summary Stats.
In your test case you just accumulate sum of '2' (which has exactrepresentation)and then normalize it by N=500. The resulthas the exact representation until the number of summands is less or equal to the specifc threshold defined by data type - for single precision it is equal to 2^24. Once you exceed the threshold the estimate for the mean should degrade: if N=2^25 and the computations are done in single precision, the output of summation followed by the division is 1.0 and outputof the one pass algorithm is ~2.0.
Does it explain the difference?
Thanks,
Andrey

mecej4 · ‎11-09-2011

Sergey, you seem to have overlooked the main point of Andrey's explanation.

The MKL algorithm, as he explained it, is a one-pass algorithm for computing not only the mean but other statistics such as the standard deviation.

Another way of looking at the MKL calculation of the mean is as an update algorithm for use with streaming data. As new data become available, the statistics are updated. As a side effect of how this calculation is done, as early as after taking in the third data point, the result is no longer exact, since 1/3 is not exact in floating point.

For real data, again as Andrey explained, the errors arising from the finite precision of floating point are negligible compared to the noise in the data. Therefore, the failure of MKL to deliver an exact result for a case where a different algorithm can yield an exact result is not a significant deficiency.

tennican · ‎11-09-2011

Hi Andrey,

Thanks for your explanation. I understand that the discrepancy is unavoidable for the single pass method.
Of course, when you display results for special cases like this to unsophisticated users you need to give them what they expect. In this case, formatting the result to 14 decimal places does the job.

The reason why I noticed is that the slight discrepancy in the mean messed up the workaround code described in my previous thread for the special case of constant data.
This code below results in skewness_est: -1.#IND and kurtosis: kurtosis_est = 21530895567645716.
However, for this case both skewness_est and kurtosis_est should be: -1.#IND
And, this is in fact the result when I replace the mean_est with the exact mean which gives data of all zeros
None of this is a real problem because I can just interpret huge values of kurtosis as an error return.

thanks for your help, Scott
----------------------------
if ( cen2mom == cen2mom && (cen2mom < 1.0e-14 || abs(trueMean) > 1.0e8 * cen2mom) ) {
vector data;
for ( int idx = 0; idx < m_count; idx++ ) {
data.push_back( m_data[idx] - trueMean );
}
status = vsldSSNewTask( &task, &numCols, &numRows, &xstorage, &(data[0]), 0, indices );
// Setup moment computation. Computing kurtosis requires all the other moments.
status = vsldSSEditTask(task, VSL_SS_ED_SKEWNESS, &skewness_est);
status = vsldSSEditTask(task, VSL_SS_ED_KURTOSIS, &kurtosis_est);
status = vsldSSEditMoments( task, &mean_est, &raw2mom, &raw3mom, &raw4mom, &cen2mom, &cen3mom, &cen4mom );

// Request moment computation
mask = VSL_SS_MEAN | VSL_SS_KURTOSIS | VSL_SS_SKEWNESS |
VSL_SS_2R_MOM | VSL_SS_3R_MOM | VSL_SS_4R_MOM |
VSL_SS_2C_MOM | VSL_SS_3C_MOM | VSL_SS_4C_MOM;
status = vsldSSCompute(task, mask, VSL_SS_METHOD_FAST);

// Remove the task
status = vslSSDeleteTask( &task );
}

SergeyKostrov · ‎11-09-2011

Thank you, guys!You shortly discussed about theStandard Deviation. So, my questionsare:

Do you calculate it as Biased or Unbiased? Could I specify it?

Best regards,
Sergey

Andrey_N_Intel · ‎11-09-2011

Hi Sergey,
As specified in Intel MKL Manual (Chapter Statistical Functions/Summary Statistics, section Math Notations and Definitions) and in Summary Statistics Application Notes (Chapter Algorithms and Interfaces in Summary Statistics, Section Calculating Multiple Estimates) the library provides unbiased estimates for expectation, variance, variance-covariance matrix, and biased estimates for central moments of 3d and 4th order.
Thanks,
Andrey

tennican · ‎11-10-2011

Hi Sergey,

I believe the vsldSS task allows us to request the second central moment
which is the population variance or the biased sample variance.
But, you can easily convert to the unbiased sample variance by multiplying by N/(N-1)
where N is sample size.

Scott

Andrey_N_Intel · ‎11-10-2011

Hi Scott,
Just to make sure that we are on the same page: Intel MKL Summary Stats returns unbiased variance.
If for some reasonswe need the biasedestimate, we can get itby multiplying by (N-1)/N.
Thanks,
Andrey

tennican · ‎11-11-2011

Hi Andrey,

I just tested and you are absolutely right.
Requesting VSL_SS_2C_MOM, I get the unbiased sample variance.
The 2C_MOM in the name led me to think second central moment which usually refers to the biased sample variance.

thanks for straightening me out, Scott

SergeyKostrov · ‎11-14-2011

Hi Scott,

I've spentextra time on the issue and I wanted to reproducea rounding problem ( or precision loss ) with the Meanin a simple C test-case.

On http://en.wikipedia.org/wiki/Algorithms_for_calculating_variancethere are 5algorithmswith pseudo-codesto calculate the Variance.I've implementedfour of them in C to calculate theStandard Deviation.

The most interesting thing isthat in case of One-Pass\Online algorithm I could not reproduce the problem!

The Mean for a data set with 500values of 2.0 was 2.0.

Take a look at my test-case for One-Pass algorithm and let me know if you're interested to get a complete test-case with 4 algorithms ( Naive, Two-Pass, Compensated and One-Pass )

...
/* Sources forNaive, Two-Pass, Compensated algorithmsare not included*/
...

//#define _RTTYPEfloat
#define _RTTYPEdouble
//#define _RTTYPElong double

RTint iN = 0;
RTint i = 0;

_RTTYPE dX[500] = { 0.0 };// Data Set 8
iN = 500;
for( i = 0; i < iN; i++ )
dX = 2.0;

RTint iB = 0; // 0 - UnBiased\1 - Biased
...
// Sub-Test 5 - One-Pass Variance & StdDev
{
/*
Def One_Pass_Variance( Data ):
N = 0
Mean = 0
M2 = 0
Delta = 0
for x in data:
N = N + 1
Delta = x - Mean
Mean = Mean + ( Delta / N )
M2 = M2 + Delta * ( x - Mean )# This expression uses the new value of Mean
Variance = M2 / ( N - B )
return Variance
*/
///*
_RTTYPE dVariance = 0.0;
_RTTYPE dStdDev = 0.0;
_RTTYPE dMean = 0.0;
_RTTYPE dM2 = 0.0;
_RTTYPE dDelta = 0.0;

for( i = 0; i < iN; i++ )
{
dDelta = dX - dMean;
dMean = dMean + ( dDelta / ( i+1 ) );
dM2 = dM2 + dDelta * ( dX - dMean );
}
dVariance = dM2 / ( iN - iB );
dStdDev = CrtSqrt( dVariance );

CrtPrintf( RTU("One-Pass StdDev: %.24f\n"), ( _RTTYPE )dStdDev );
CrtPrintf( RTU(" Mean: %.24f\n"), ( _RTTYPE )dMean );
CrtPrintf( RTU("\n") );
//*/
}
...

Results:

Naive StdDev: 0.000000000000000000000000
Mean: 2.000000000000000000000000

Two-Pass StdDev: 0.000000000000000000000000
Mean: 2.000000000000000000000000

Compensated StdDev: 0.000000000000000000000000
Mean: 2.000000000000000000000000

One-Pass StdDev: 0.000000000000000000000000
Mean: 2.000000000000000000000000

Fabio_G_ · ‎03-23-2014

@Sergey

Sorry but what would you expect from your code? You are summing TWO! Try with normally distributed random number and as Andrey said your error in computing mean and variance will scale as sqrt(nsamples).

I'm running Montecarlo simulations and the issue about normal RNG (mean and variance accuracy) can be destructive on fine structure of the physics. The solution of increasing sample space can be viable but sometime (depending on the physics of the system) can be even worst. A better choice would be to reject a random realization based on a threshold on its mean and/or variance, although VERY computationally expensive.