Community
cancel
Showing results for 
Search instead for 
Did you mean: 
tennican
Beginner
105 Views

VSL Summary Statistics skewness and kurtosis are incorrect

Jump to solution
I computed mean, variance, skewness and kurtosis for a test vector.
The mean was correct and the variance was just a little off.
But, the MKL estimates for skewness and kurtosis totally disagree with the R moments package.

R says:
> skewness(X)
[1] 2.400606
> kurtosis(X)
[1] 14.15447

MKL says:
skewness_est = -3.9303471475184684
kurtosis_est = 5.7271436907737581

Can anyone see what I did wrong in the code below:

int status;
VSLSSTaskPtr task;
MKL_INT xstorage = VSL_SS_MATRIX_STORAGE_ROWS;
MKL_INT numCols = 1;
MKL_INT numRows = 1419;
int indices[1] = {1};
status = vsldSSNewTask( &task, &numCols, &numRows, &xstorage, &(X[0]), 0, indices );
double mean_est = X[0];
double raw2mom = X[0];
double raw3mom = X[0];
double raw4mom = X[0];
double variance_est = X[0];
double cen3mom = X[0];
double cen4mom = X[0];
double skewness_est = X[0];
double kurtosis_est = X[0];
status = vsldSSEditTask(task, VSL_SS_ED_SKEWNESS, &skewness_est);
status = vsldSSEditTask(task, VSL_SS_ED_KURTOSIS, &kurtosis_est);
status = vsldSSEditMoments( task, &mean_est, &raw2mom, &raw3mom, &raw4mom,
&variance_est, &cen3mom, &cen4mom );
status = vsldSSCompute(task, VSL_SS_MEAN|VSL_SS_2C_MOM|VSL_SS_SKEWNESS|VSL_SS_KURTOSIS,
VSL_SS_METHOD_FAST);


X is composed of the attached data


0 Kudos
1 Solution
mecej4
Black Belt
105 Views
You mixed up the sample variance and the second order central moment. The two, while being closely related, are not identical

I believe that there is a typographical error in the value reported to be given by R for the kurtosis.

Here is a complete program to read the data file and print the statictics (suggestion: put the data into a file and attach the file to your post rather than putting the voluminous data inline).

[cpp]#include 
#include 

int main(int argc,char *argv[]){        
  int status; double *X; int i;
  VSLSSTaskPtr task;
  MKL_INT xstorage = VSL_SS_MATRIX_STORAGE_ROWS;
  MKL_INT numCols = 1, numRows = 1419;
  int indices[1] = {1};
  unsigned MKL_INT64 mask = 0;
  
  X=(double *)malloc(numRows*sizeof(double));
  for(i=0; i With this program, I get

[bash]s:lang>icl /Qmkl /Ot stats.cpp

s:lang>stats < dat
mean =  1.0505e+004,  var =  1.0115e+000
skew =  2.3981e+000, kurt =  1.1135e+001

[/bash]

View solution in original post

10 Replies
mecej4
Black Belt
106 Views
You mixed up the sample variance and the second order central moment. The two, while being closely related, are not identical

I believe that there is a typographical error in the value reported to be given by R for the kurtosis.

Here is a complete program to read the data file and print the statictics (suggestion: put the data into a file and attach the file to your post rather than putting the voluminous data inline).

[cpp]#include 
#include 

int main(int argc,char *argv[]){        
  int status; double *X; int i;
  VSLSSTaskPtr task;
  MKL_INT xstorage = VSL_SS_MATRIX_STORAGE_ROWS;
  MKL_INT numCols = 1, numRows = 1419;
  int indices[1] = {1};
  unsigned MKL_INT64 mask = 0;
  
  X=(double *)malloc(numRows*sizeof(double));
  for(i=0; i With this program, I get

[bash]s:lang>icl /Qmkl /Ot stats.cpp

s:lang>stats < dat
mean =  1.0505e+004,  var =  1.0115e+000
skew =  2.3981e+000, kurt =  1.1135e+001

[/bash]

View solution in original post

Andrey_N_Intel
Employee
105 Views
Hello,
Thanks for the communication.
Please, give us some time to reproduce your results;we will then return back to you.
Best,
Andrey
Andrey_N_Intel
Employee
105 Views
Hello,

We reproduced your results and compared MKLversion of simple statistics against R package "Moments". Below is the summary of the observations and comments:

1.In computation of variance R package uses normalization n, where n is size of dataset x = (x(i)), i=1,..,n. MKL uses normalization n-1 which provides unbiasedness of the estimate. To be more more precise, MKL uses B=W-sum (w(i)^2) / W, W= sum (w(i)) where w(i) is weight assigned to observation x(i). With all weights w(i) set to 1 B is equal to n-1.

2.MKLversion of kurtotis is based on the algorithm K(x) = C4(x) / V(x) ^ 2 - 3where C4 is the central moment of 4th order, V(x) - variance. R package basically uses the same formula but without term "-3". Term "-3" is set to have thisestimate equal to zero for Gaussian distribution.
MKL versions ofcentral moment of 4th order and variance use normalizations n and n-1, correspondingly, while R uses normalization n in both cases.This can contribute insome numerical differences between R and MKLestimates of kurtosis.

Please, see Section "Mathematical Notation and Definitions" of Chapter Statistical Functions/ Summary Statistics in Intel MKL Manual for the additional detailsbehind the algorithms of Summary Stats in MKL.

3.Theimplementation ofkurtosis in MKL contains the bug which appears in the settings of the test case in the first post. Welocalized and fixed it. The fix will be available inone of future releases of the library. Currently, the work-around for thebug is to request "simultaneous" computation of kurtosis and central moments of 2nd and 4th order.

Please, let me know if this answers yourquestions.
Please, feel free to submit morequestions/feedbacks on Statistical Component of MKL,and we will help.

Best,
Andrey


tennican
Beginner
105 Views
Thanks very much. You solved my problem.

I think the main problem was that I hadn't requested the computation of all the precursors to skewness and kurtosis which you do correctly in your program and Audrey mentions is necessary below.
And R reports kurtosis rather than excess kurtosis as Audrey says.

Also, when I export the correct data to validate in R, sd(sales) in R equals sqrt(cen2mom) from MKL.

The text below is crossed out to avoid confusion:
I'm not so sure about the variance being different from the 2nd central moment.
I think you thought I wanted to compute the coefficient of variation which is quite different from variance.

There are small differences in the estimates of R var and MLK cen2mom.
However, R does report the unbiased not the biased estimate of variance.
So, the difference is not due to that.
I computed the biased and unbiased variance by hand below and the unbiased one is exactly var(X).
Also, I computed the variance from the coefficient of variation given by MKL and got a number
quite close to the cen2mom computed by MKL as you would expect.

Anyway, the estimates are close enough for my current purpose. So, I'm happy.

> var(X)
[1] 112904318
> N = length(X)
> myvar = sum((X-mean(X))^2)/N
> myvar
[1] 112824752
> myvar*N/(N-1)
[1] 112904318
>
> cen2mom = 112904462.89915065
> variation_est = 1.0114540252081452
> (variation_est * abs(mean(X)))^2
[1] 112904489


Andrey_N_Intel
Employee
105 Views
Hi Scott,

We ranthe test case with your data using double version of Summary Statistics functions on Nehalem CPU, Linux, in both IA-32 and Intel 64modesand obtained the followingoutput for mean and variance with MKL 10.3 (I do not provide the numbers forother estimates):

Mean = 10505.324172 Variance = 112904318.274247

That is, the value of variance is aligned with R output.
On the other hand, you communicate a different value: cen2mom = 112904462.89915065.
It might make sense to sync-up the results obtained on your and our sides.

Can you please clarify your environment:
1. MKL version
2. OS
3. CPU
4. Mode, IA-32 or Intel 64
5. Build/link line
6. Precision, double of float

Thanks,
Andrey




tennican
Beginner
105 Views
Hi Andrey,

I'm currently building within a larger project.
I did briefly try to build this example standalone but didn't want to waste time figuring out how to get around the fact that I don't want to install libiomp5md.dll on my computer.

### Environment
Using MKL 10.3 - ia32 and double precision 'd' functions
In C++ in MSVS 2010
Windows 7 Professional SP1 - 64-bit
Intel Xeon CPU dual-proc E5520 @ 2.27GHz

### To build
### I copied libiomp5md.dll and libiomp5md.lib into the mkl/lib/ia32 directory and used:

mkl_rt.lib
$(mkl-root)\include
$(mkl-root)\lib\ia32




$(mkl-include)%(AdditionalIncludeDirectories)




$(IntelMklLib);%(AdditionalDependencies)


$(mkl-lib)%(AdditionalLibraryDirectories)




thanks for your help, Scott
tennican
Beginner
105 Views
Hi Andrey,

The discrepancy is due to an data export error.
I just realized that the data I was exporting to validate in R was rounded to dollars.
Now that I have fixed that, I get exactly the same variance in R as with MKL.

sorry for the confusion, Scott
Andrey_N_Intel
Employee
105 Views
Hi Scott,
It's good to know that we resolved the issue.
Please, feel free to askquestions on Statistical Component of Intel MKL, weare ready to help.
Best,
Andrey
mecej4
Black Belt
105 Views
To avoid confusing readers of this thread, and if you do not mind doing so, please edit your earlier post and make necessary corrections. The overstrike button may be used to cross-out incorrect parts.

Thank you.
Gennady_F_Intel
Moderator
105 Views
quote "3.Theimplementation ofkurtosis in MKL contains the bug which appears in the settings of the test case in the first post. Welocalized and fixed it. The fix will be available inone of future releases of the library. ....."
The problem has been already fixed in the latest version MKL 10.3.Update 6 available now for download.
The official anounce of this release you can find here.
--Gennady
Reply