Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.

Linear regression module, issues

Reinhard_M_
Beginner
1,390 Views

 I have examined the linear regression module of the DAAL 2018 (Windows version)

1) In the "single_beta" submodule, the RMS error and the variance are inconsistent, i.e. var(j) != rms2(j)*n/(m-p-1) 

IMO, in linear_regression_single_beta_dense_default_batch_impl.i, line #316 should be

 pRms = daal::internal::Math<algorithmFPType,cpu>::sSqrt(div1*pRms);

instead of

 pRms = div1*daal::internal::Math<algorithmFPType,cpu>::sSqrt(pRms);

2) The "group_of_betas" submodule contains an unusual definition of the goodness parameter R2

To my best knowledge, R2 runs from 0 ("no fit") to 1 ("perfect fit"). In the DAAL implementation (and documentation),

R2 runs from 0 ("no fit") to 1/n ("perfect fit"). I don't whether this is a buf or a feature

0 Kudos
5 Replies
Shaojuan_Z_Intel
Employee
1,390 Views

Hi Reinhard, which file are you referring to? thanks

0 Kudos
Reinhard_M_
Beginner
1,390 Views

Topic 1: The file is "linear_regression_single_beta_dense_default_batch_impl.i", line #316

Topic 2: IMO, the regression sum of squares RegSS is just the squares and not the mean square. Therefore,

            RegSS != TSS - ResSS;

            In your example, TSS-ResSS ≈ n*RegSS. I think, this is the root cause if the R^2 issue.

 

0 Kudos
Shaojuan_Z_Intel
Employee
1,390 Views

Thanks for pointing this out. The files are from our GitHub. We will investigate this with our engineering team. Thank you!

0 Kudos
Shaojuan_Z_Intel
Employee
1,390 Views

Hi Reinhard, we are analyzing the question and will get back to you with details. Thanks!

0 Kudos
VictoriyaS_F_Intel
1,390 Views

Hello Reinhard,

Both of your observations are correct –there are bugs in the quality metrics for linear regression in Intel DAAL 2018 Gold. We will fix them in one of future releases of the library.

0 Kudos
Reply