I'm working with linear regression based on QR decomposition algorithm and for some datasets, the Mean Squared Error (MSE) obtained is very high. I realized that, in this situations, the matrix rank isn't full because there are rows/columns linearly dependent.
When I execute the same datasets in algorithm based on Normal Equation Regularized by ridge method, the MSE generated is the expected.
How proceed in this situations? It's possible find a linear dependency relation among attributes using DAAL?
You can analyze the dependence between attributes of the dataset using Intel DAAL correlation algorithm. The value ~ +/-1 in (i,j) position of the correlation matrix would indicate linear dependence between i and j attributes. Let me know, if you need any help on use of the algorithm.
To better understand your use scenario of the linear regression, can you please provide the additional details:
- what is the typical size of the input dataset used to train linear regression model?
- are you interested in sparse or dense version of the linear regression (with or without regularization)?
- do you train the model for one or several dependent variables/responses?
- do you use publically available datasets for testing of the linear regression? If so, can you share the links with us?