Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
6561 Discussions

## VSL Summary Statistics skewness and kurtosis are incorrect Beginner
232 Views
I computed mean, variance, skewness and kurtosis for a test vector.
The mean was correct and the variance was just a little off.
But, the MKL estimates for skewness and kurtosis totally disagree with the R moments package.

R says:
> skewness(X)
 2.400606
> kurtosis(X)
 14.15447

MKL says:
skewness_est = -3.9303471475184684
kurtosis_est = 5.7271436907737581

Can anyone see what I did wrong in the code below:

int status;
MKL_INT xstorage = VSL_SS_MATRIX_STORAGE_ROWS;
MKL_INT numCols = 1;
MKL_INT numRows = 1419;
int indices = {1};
double mean_est = X;
double raw2mom = X;
double raw3mom = X;
double raw4mom = X;
double variance_est = X;
double cen3mom = X;
double cen4mom = X;
double skewness_est = X;
double kurtosis_est = X;
status = vsldSSEditMoments( task, &mean_est, &raw2mom, &raw3mom, &raw4mom,
&variance_est, &cen3mom, &cen4mom );
VSL_SS_METHOD_FAST);

X is composed of the attached data

1 Solution Black Belt
232 Views
You mixed up the sample variance and the second order central moment. The two, while being closely related, are not identical

I believe that there is a typographical error in the value reported to be given by R for the kurtosis.

Here is a complete program to read the data file and print the statictics (suggestion: put the data into a file and attach the file to your post rather than putting the voluminous data inline).

```[cpp]#include
#include

int main(int argc,char *argv[]){
int status; double *X; int i;
MKL_INT xstorage = VSL_SS_MATRIX_STORAGE_ROWS;
MKL_INT numCols = 1, numRows = 1419;
int indices = {1};

X=(double *)malloc(numRows*sizeof(double));
for(i=0; i With this program, I get[bash]s:lang>icl /Qmkl /Ot stats.cpp

s:lang>stats < dat
mean =  1.0505e+004,  var =  1.0115e+000
skew =  2.3981e+000, kurt =  1.1135e+001
[/bash]```
10 Replies Black Belt
233 Views
You mixed up the sample variance and the second order central moment. The two, while being closely related, are not identical

I believe that there is a typographical error in the value reported to be given by R for the kurtosis.

Here is a complete program to read the data file and print the statictics (suggestion: put the data into a file and attach the file to your post rather than putting the voluminous data inline).

```[cpp]#include
#include

int main(int argc,char *argv[]){
int status; double *X; int i;
MKL_INT xstorage = VSL_SS_MATRIX_STORAGE_ROWS;
MKL_INT numCols = 1, numRows = 1419;
int indices = {1};

X=(double *)malloc(numRows*sizeof(double));
for(i=0; i With this program, I get[bash]s:lang>icl /Qmkl /Ot stats.cpp

s:lang>stats < dat
mean =  1.0505e+004,  var =  1.0115e+000
skew =  2.3981e+000, kurt =  1.1135e+001
[/bash]``` Employee
232 Views
Hello,
Thanks for the communication.
Please, give us some time to reproduce your results;we will then return back to you.
Best,
Andrey Employee
232 Views
Hello,

We reproduced your results and compared MKLversion of simple statistics against R package "Moments". Below is the summary of the observations and comments:

1.In computation of variance R package uses normalization n, where n is size of dataset x = (x(i)), i=1,..,n. MKL uses normalization n-1 which provides unbiasedness of the estimate. To be more more precise, MKL uses B=W-sum (w(i)^2) / W, W= sum (w(i)) where w(i) is weight assigned to observation x(i). With all weights w(i) set to 1 B is equal to n-1.

2.MKLversion of kurtotis is based on the algorithm K(x) = C4(x) / V(x) ^ 2 - 3where C4 is the central moment of 4th order, V(x) - variance. R package basically uses the same formula but without term "-3". Term "-3" is set to have thisestimate equal to zero for Gaussian distribution.
MKL versions ofcentral moment of 4th order and variance use normalizations n and n-1, correspondingly, while R uses normalization n in both cases.This can contribute insome numerical differences between R and MKLestimates of kurtosis.

Please, see Section "Mathematical Notation and Definitions" of Chapter Statistical Functions/ Summary Statistics in Intel MKL Manual for the additional detailsbehind the algorithms of Summary Stats in MKL.

3.Theimplementation ofkurtosis in MKL contains the bug which appears in the settings of the test case in the first post. Welocalized and fixed it. The fix will be available inone of future releases of the library. Currently, the work-around for thebug is to request "simultaneous" computation of kurtosis and central moments of 2nd and 4th order.

Please, feel free to submit morequestions/feedbacks on Statistical Component of MKL,and we will help.

Best,
Andrey Beginner
232 Views
Thanks very much. You solved my problem.

I think the main problem was that I hadn't requested the computation of all the precursors to skewness and kurtosis which you do correctly in your program and Audrey mentions is necessary below.
And R reports kurtosis rather than excess kurtosis as Audrey says.

Also, when I export the correct data to validate in R, sd(sales) in R equals sqrt(cen2mom) from MKL.

The text below is crossed out to avoid confusion:
I'm not so sure about the variance being different from the 2nd central moment.
I think you thought I wanted to compute the coefficient of variation which is quite different from variance.

There are small differences in the estimates of R var and MLK cen2mom.
However, R does report the unbiased not the biased estimate of variance.
So, the difference is not due to that.
I computed the biased and unbiased variance by hand below and the unbiased one is exactly var(X).
Also, I computed the variance from the coefficient of variation given by MKL and got a number
quite close to the cen2mom computed by MKL as you would expect.

Anyway, the estimates are close enough for my current purpose. So, I'm happy.

> var(X)
 112904318
> N = length(X)
> myvar = sum((X-mean(X))^2)/N
> myvar
 112824752
> myvar*N/(N-1)
 112904318
>
> cen2mom = 112904462.89915065
> variation_est = 1.0114540252081452
> (variation_est * abs(mean(X)))^2
 112904489 Employee
232 Views
Hi Scott,

We ranthe test case with your data using double version of Summary Statistics functions on Nehalem CPU, Linux, in both IA-32 and Intel 64modesand obtained the followingoutput for mean and variance with MKL 10.3 (I do not provide the numbers forother estimates):

Mean = 10505.324172 Variance = 112904318.274247

That is, the value of variance is aligned with R output.
On the other hand, you communicate a different value: cen2mom = 112904462.89915065.
It might make sense to sync-up the results obtained on your and our sides.

1. MKL version
2. OS
3. CPU
4. Mode, IA-32 or Intel 64
6. Precision, double of float

Thanks,
Andrey Beginner
232 Views
Hi Andrey,

I'm currently building within a larger project.
I did briefly try to build this example standalone but didn't want to waste time figuring out how to get around the fact that I don't want to install libiomp5md.dll on my computer.

### Environment
Using MKL 10.3 - ia32 and double precision 'd' functions
In C++ in MSVS 2010
Windows 7 Professional SP1 - 64-bit
Intel Xeon CPU dual-proc E5520 @ 2.27GHz

### To build
### I copied libiomp5md.dll and libiomp5md.lib into the mkl/lib/ia32 directory and used:

mkl_rt.lib
\$(mkl-root)\include
\$(mkl-root)\lib\ia32 Beginner
232 Views
Hi Andrey,

The discrepancy is due to an data export error.
I just realized that the data I was exporting to validate in R was rounded to dollars.
Now that I have fixed that, I get exactly the same variance in R as with MKL.

sorry for the confusion, Scott Employee
232 Views
Hi Scott,
It's good to know that we resolved the issue.
Best,
Andrey Black Belt
232 Views
To avoid confusing readers of this thread, and if you do not mind doing so, please edit your earlier post and make necessary corrections. The overstrike button may be used to cross-out incorrect parts.

Thank you. Moderator
232 Views
quote "3.Theimplementation ofkurtosis in MKL contains the bug which appears in the settings of the test case in the first post. Welocalized and fixed it. The fix will be available inone of future releases of the library. ....." 