Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

BACON outlier detection

Yaniv_H_1
Beginner
401 Views

I'm trying to run the following code:

#include <iostream>

#include "mkl.h"

int main () {

    /* Define vector of obseravations (5 2D observation points) */

    float pObservations [] = {1., 2., 3., 4.2, 5, 9., 10., 8., 7., 6., 5., 9.};

    /* Creates and initializes a new summary statistics task descriptor */

    VSLSSTaskPtr task;
    const int p = 2;
    const int n = 5;
    const int xstorage = VSL_SS_MATRIX_STORAGE_ROWS;
    int status = 0;
    status = vslsSSNewTask (&task, &p, &n, &xstorage, pObservations, NULL, NULL);
    if (status != VSL_STATUS_OK) {
        std::cout << "Failed to create a new summary statistics task descriptor" << std::endl;
        throw false;
    }

    /* Modifies array pointers related to multivariate mean calculation */

    float* pMean = new float

;
    status = vslsSSEditTask (task, VSL_SS_ED_MEAN, pMean);
    if (status != VSL_STATUS_OK) {
        std::cout << "Failed to modifies array pointers related to multivariate mean calculation" << std::endl;
        throw false;
    }

    /* Computes Summary Statistics estimates - mean calculation */

    status = vslsSSCompute(task, VSL_SS_MEAN, VSL_SS_METHOD_FAST);
    if (status != VSL_STATUS_OK) {
        std::cout << "Failed to compute summary statistics estimates with error code " << status << std::endl;
        throw false;
    }

    // Print mean values
    for (int ip = 0; ip < p; ip++)
        std::cout << pMean [ip] << std::endl;

    /* Modifies array pointers related to multivariate outliers detection */

    const int nParams = 0;
    float* pWeights = new float ;
    status = vslsSSEditOutliersDetection (task, &nParams, NULL, pWeights);
    if (status != VSL_STATUS_OK) {
        std::cout << "Failed to modifies array pointers related to multivariate outliers detection" << std::endl;
        throw false;
    }

    /* Computes Summary Statistics estimates - outlier detection */

    status = vslsSSCompute(task, VSL_SS_OUTLIERS, VSL_SS_METHOD_BACON);
    if (status != VSL_STATUS_OK) {
        std::cout << "Failed to compute summary statistics estimates with error code " << status << std::endl;
        throw false;
    }

    return (0);
}

for a 2D data set consisting of 5 pairs of observation. The output of the program reads:

3.04
8
Failed to compute summary statistics estimates with error code -4002
terminate called after throwing an instance of 'bool'
Abort (core dumped)

The first two numbers are the mean values of the observations in each dimension (2), and the result is accurate. 

I'm using the same dataset for the outlier detection BACON algorithm, but can an error -4002, which means that the number of input observation (5 in my case) is either 0 or negative. 

 

Is this a bug in MKL, or something wrong on my side.

 

Thanks,

Yaniv

 

 

 

 

0 Kudos
4 Replies
Andrey_N_Intel
Employee
401 Views

Hi Yaniv,

The size of the basic subset m used in the Bacon flow is suggested to be  m = cp where m is 4 or 5, by the original paper. Intel MKL version of the BACON uses c = 5. So, if number of observations/feature vectors passed into the library is smaller than 5 p, the library returns the respective error indicating the bad number of the observations. The library indicates that all observations are outliers, if the size of the basic subset becomes smaller than 5p during the computation. We should extend the documentation on the algorithm with description of those cases. Please, let me know, if it answers your question.

Thanks,

Andrey

 

0 Kudos
Yaniv_H_1
Beginner
401 Views

Hi, I changed the code I originally posted:

...

    float pObservations [] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0,  7.0,  8.0, 9.0, 10.0, 11.0,
                              1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,  1.0};

    /* Creates and initializes a new summary statistics task descriptor */

    VSLSSTaskPtr task;
    const int p = 2;
    const int n = 11;

...

and it finished without errors. The algorithm, naturally, didn't find any outliers. However, when I changed the observation array to:

    float pObservations [] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0,  7.0,  8.0, 9.0, 10.0, 11.0,
                              1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -10.0, 1.0,  1.0};

with observation 9 clearly an outlier, the algorithm hanged in BACON computation. I had to manually abort the program. 

0 Kudos
Andrey_N_Intel
Employee
401 Views

Hi Yaniv, this behavior of the algorithm is reproduced with the dataset above, will investigate it. Andrey

0 Kudos
Gennady_F_Intel
Moderator
401 Views

Dear Yaniv! 

the problem has been fixed in MKL v.2017 update 1 ( released at Nov 1st ). Could you please check this update and let us know how it works on your side.

regards, MKL team

0 Kudos
Reply