Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements

## Scaling with least squares

Beginner
231 Views

Hello,

I am trying to fit some data using higher order polynomials. The data has 15000 points with ranges as below:

X (independent):  Min Value = 100000, Max Value = 6000000

Y (dependent): Min Val = 150,000, Max Val = 560,000

I am using the GELS least squares driver (SVD method). For the coefficient matrix, I am scaling each value by the respective column average. I still calculate x^20 for all x observations, then calculate the average and then scale the column values.

For a polynomial of order 20, I get results from the code and these values differ starting at 2nd or 3rd decimal place, as compared to values obtained using a commercially available statistical analysis software, which give more accurate predictions.

How can I improve the accuracy of the least squres fit? I see the following issues, but havent found a solution yet:

1. When I calculate the powers (x^16, x^17...etc), for the coefficient matrix, there may be some precision issues.

2. Is my scaling correct? Or should I use something like ( x - mean_x)/ (stddev_x) [ I just found this via Google]. In this case, how do I get the correct coefficients back?

-V

3 Replies
Black Belt
231 Views

The problem is inherently ill-conditioned. Unless you have a good justification, you should not even attempt to fit high-order polynomials (such as 20th degree) to data. The Census example that comes with Matlab illustrates these problems quite well. With degree 3 or 4, the US census data, projected to a couple of years beyond the end of the data, produce reasonable extrapolations. Then one gets greedy and raises the degree, hoping to "improve" the "prediction". Degree 6 may give a prediction that is double the current population figure, whereas degree 8 gives a negative value. The example Matlab code allows you to display confidence bands on the prediction so that you can appreciate why raising the degree can make the results worse.

In fact, you should look for a model expression that is better suited to describing the data than a high order polynomial. MKL provides effective routines  (?TRNLSP) for fitting model expressions that are nonlinear in the regression parameters.

Beginner
231 Views

Thank you for the response.