Reinhard_M_

Beginner

09-23-2017
08:16 AM

Linear regression module, issues

I have examined the linear regression module of the DAAL 2018 (Windows version)

1) In the "single_beta" submodule, the RMS error and the variance are inconsistent, i.e. var(j) != rms^{2}(j)*n/(m-p-1)

IMO, in linear_regression_single_beta_dense_default_batch_impl.i, line #316 should be

pRms

instead of

pRms

2) The "group_of_betas" submodule contains an unusual definition of the goodness parameter R^{2}

To my best knowledge, R^{2} runs from 0 ("no fit") to 1 ("perfect fit"). In the DAAL implementation (and documentation),

R^{2} runs from 0 ("no fit") to 1/n ("perfect fit"). I don't whether this is a buf or a feature

Shaojuan_Z_Intel

Employee

09-27-2017
03:02 PM

Hi Reinhard, which file are you referring to? thanks

Reinhard_M_

Beginner

09-28-2017
05:35 AM

Topic 1: The file is "linear_regression_single_beta_dense_default_batch_impl.i", line #316

Topic 2: IMO, the regression sum of squares RegSS is just the squares and not the mean square. Therefore,

RegSS != TSS - ResSS;

In your example, TSS-ResSS ≈ n*RegSS. I think, this is the root cause if the R^2 issue.

Shaojuan_Z_Intel

Employee

09-28-2017
10:16 AM

Shaojuan_Z_Intel

Employee

09-29-2017
11:57 AM

Hi Reinhard, we are analyzing the question and will get back to you with details. Thanks!

VictoriyaS_F_Intel

Employee

10-02-2017
06:50 AM

Hello Reinhard,

Both of your observations are correct –there are bugs in the quality metrics for linear regression in Intel DAAL 2018 Gold. We will fix them in one of future releases of the library.

