I am trying to fit some data using higher order polynomials. The data has 15000 points with ranges as below:
X (independent): Min Value = 100000, Max Value = 6000000
Y (dependent): Min Val = 150,000, Max Val = 560,000
I am using the GELS least squares driver (SVD method). For the coefficient matrix, I am scaling each value by the respective column average. I still calculate x^20 for all x observations, then calculate the average and then scale the column values.
For a polynomial of order 20, I get results from the code and these values differ starting at 2nd or 3rd decimal place, as compared to values obtained using a commercially available statistical analysis software, which give more accurate predictions.
How can I improve the accuracy of the least squres fit? I see the following issues, but havent found a solution yet:
1. When I calculate the powers (x^16, x^17...etc), for the coefficient matrix, there may be some precision issues.
2. Is my scaling correct? Or should I use something like ( x - mean_x)/ (stddev_x) [ I just found this via Google]. In this case, how do I get the correct coefficients back?
Thank you for your advice.
The problem is inherently ill-conditioned. Unless you have a good justification, you should not even attempt to fit high-order polynomials (such as 20th degree) to data. The Census example that comes with Matlab illustrates these problems quite well. With degree 3 or 4, the US census data, projected to a couple of years beyond the end of the data, produce reasonable extrapolations. Then one gets greedy and raises the degree, hoping to "improve" the "prediction". Degree 6 may give a prediction that is double the current population figure, whereas degree 8 gives a negative value. The example Matlab code allows you to display confidence bands on the prediction so that you can appreciate why raising the degree can make the results worse.
In fact, you should look for a model expression that is better suited to describing the data than a high order polynomial. MKL provides effective routines (?TRNLSP) for fitting model expressions that are nonlinear in the regression parameters.
Thank you for the response.
Pls allow me to add some more info.
I am familiar with the underlying function behavior and am using the polynomial fit strictly for interpolation purposes only. The Y values are essentially a solution of a set of differential equations involving X. I am trying to speed up my calculations by using these polynomial fits so that I dont have to solve the equations.
So far, I was using this commercial tool, and it works very well. Now I would like to automate the process using MKL so that I can generate these fits on the fly and use them. My challenge is that I cannot get the coefficients to match.
I am not acquainted with "this commercial tool".
Please post details of what you did to obtain the fit, using a small example; showing code, expected results and actual results will help one to find the source of the problem.