How to do estimation of nonlinear functions with nonlinear Least Squares dtrnlsp

jlperla · ‎07-27-2009

Hi there,
Sorry if this is a basic question, but I was very confused reading the documents and mapping to nonlinear curve fitting.

I have a model: y_i = f(z_i; beta) + noise where beta is the set of parameters I want to estimate. Say that beta \in R^n and z_i in R^p
I have data for y_i and z_i. Say that y_i has m observations
I want the beta that solves the least squares problem.

Now, from what I can gather in the docs, my beta is the x_i in the docs.
So if I implemented the RCI_Request == 1 message in the solve loop, I believe I just fill in fvec of size m with:
y_i - f(z_i, beta_current) for whatever "x"(i.e. beta_current) I am passed.

Is that correct?

Where i am getting confused is in the calculation of the jacobian for my problem. I will have an analytical derivative of my f with respect to both the z_i's (there would be p of them) and the beta (there are n of them). So my instinct on the jacobian is that it is an p by n instead of an m x n? So I am not sure what the jacobian really is for my example.

To give a concrete example, lets say that I was fitting y = beta1 * z1 + beta_2 * z2 to a bunch of y_i, z1_i, z2_i data. Say there are 3 data points i = 1, 2, 3.
I assume fvec(beta1, beta2) is the following vector....:
y_1 - beta1 * z1_1 - beta2 * z2_1
y_2 - beta1 * z1_2 - beta2 * z2_2
...
y_5 - beta1 * z1_5 - beta2 * z2_5

And what would I fill in for the jacobian? Would it be:
[-z1_1 -z2_1]
[-z1_2 -z2_2]
....
[-z1_5 -z2_5]

i.e the derivative with respective to each beta, evaluated at the approporate point for the i_th row?

Thanks for your help, and sorry that this is a basic question.

jlperla · ‎07-27-2009

Sorry, one more addition: I am also interested in the best way to do a weighted nonlinear least squares? If I am minimizing the sum of squares weighted by a positive definite matrix W. ie.
min R(x)^T W R(x)

Is the best bet to do a cholesky of W then multiple the calculateed fvec and jacobian by this? in the solve loop?

Thanks,
Jesse

ArturGuzik · ‎07-27-2009

Quoting - jlperla

Now, from what I can gather in the docs, my beta is the x_i in the docs.
So if I implemented the RCI_Request == 1 message in the solve loop, I believe I just fill in fvec of size m with:
y_i - f(z_i, beta_current) for whatever "x"(i.e. beta_current) I am passed.

Is that correct?

Where i am getting confused is in the calculation of the jacobian for my problem. I will have an analytical derivative of my f with respect to both the z_i's (there would be p of them) and the beta (there are n of them). So my instinct on the jacobian is that it is an p by n instead of an m x n? So I am not sure what the jacobian really is for my example.

To give a concrete example, lets say that I was fitting y = beta1 * z1 + beta_2 * z2 to a bunch of y_i, z1_i, z2_i data. Say there are 3 data points i = 1, 2, 3.
I assume fvec(beta1, beta2) is the following vector....:
y_1 - beta1 * z1_1 - beta2 * z2_1
y_2 - beta1 * z1_2 - beta2 * z2_2
...
y_5 - beta1 * z1_5 - beta2 * z2_5

And what would I fill in for the jacobian? Would it be:
[-z1_1 -z2_1]
[-z1_2 -z2_2]
....
[-z1_5 -z2_5]

i.e the derivative with respective to each beta, evaluated at the approporate point for the i_th row?

(1) fvec -- correct. It's residual vector

(2) Jacobian - the dimensions are (always) number of equations (usually denoted by m) by number of uknowns (estimated parameters, usually denoted by n). Usually m >> n. So if in your case both beta and z_i are to be estimated, the dimensions would be m x (n+p). From your description looks as if the beta was unknown while z_i not, so it would be: (number of data points i) x n, and thus, your example looks OK.

A.

ArturGuzik · ‎07-27-2009

Quoting - jlperla

Sorry, one more addition: I am also interested in the best way to do a weighted nonlinear least squares? If I am minimizing the sum of squares weighted by a positive definite matrix W. ie.
min R(x)^T W R(x)

Is the best bet to do a cholesky of W then multiple the calculateed fvec and jacobian by this? in the solve loop?

Thanks,
Jesse

Jesse,

the weight matrix is (most frequently) diagonal (unless you have correlated data). So inverse of diagonal matrix is the equal to inverse of its diagonal elements and you don't need anything special to get it. The weight matrix simply scales your data.
A.

jlperla · ‎07-28-2009

Quoting - ArturGuzik

(1) fvec -- correct. It's residual vector

(2) Jacobian - the dimensions are (always) number of equations (usually denoted by m) by number of uknowns (estimated parameters, usually denoted by n). Usually m >> n. So if in your case both beta and z_i are to be estimated, the dimensions would be m x (n+p). From your description looks as if the beta was unknown while z_i not, so it would be: (number of data points i) x n, and thus, your example looks OK.

A.

So it sounds like I have the signs/values right on the example Jacobian here? (that is one of the main concerns I had).

As for the weighting, I want to use this for GMM (http://en.wikipedia.org/wiki/Generalized_method_of_moments ) If I need it, is my idea of multiplying the fvec by the (upper?) cholesky the right approach? And for the jacobian, do I multiply it by the full weighting matrix instead of just it's cholesky?

Thanks for all of your help,
Jesse

ArturGuzik · ‎07-29-2009

Quoting - jlperla

So it sounds like I have the signs/values right on the example Jacobian here? (that is one of the main concerns I had).

As for the weighting, I want to use this for GMM (http://en.wikipedia.org/wiki/Generalized_method_of_moments ) If I need it, is my idea of multiplying the fvec by the (upper?) cholesky the right approach? And for the jacobian, do I multiply it by the full weighting matrix instead of just it's cholesky?

Thanks for all of your help,
Jesse

(1) signs/values looks OK

(2) I must admit I don't know (GMM) and understand (why you want a Cholesky decomposed matrix (which one? J^T J) be multiplied by fvec (residual vector) - you mean solution of WLS?). Please elaborate.

Usually weighted LS are used just to eliminate or adjust for different uncertainty in (observation) values. And the most common is to have that matrix diagonal otherwise they are correlated.

A.

jlperla · ‎07-30-2009

Quoting - ArturGuzik

(1) signs/values looks OK

(2) I must admit I don't know (GMM) and understand (why you want a Cholesky decomposed matrix (which one? J^T J) be multiplied by fvec (residual vector) - you mean solution of WLS?). Please elaborate.

Usually weighted LS are used just to eliminate or adjust for different uncertainty in (observation) values. And the most common is to have that matrix diagonal otherwise they are correlated.

A.

GMM is used in economics for estimation, especially when there is correlation in the data. A neat technique if you are interested in reading about it. It is a generalization of Weighted Least Squares, Feasible generalized least squares, etc. It gives some asymptoptic properties of the weighting matrix and its relation to the covariance of the residuals.

I think that if I have the signs/values OK then I understand which function the derivative is of. Typically, you need a positive weighting matrix in the middle of the optimizer objective, and since the fvec residual has been split, I figured you could take the cholesky of the weighting matrix to use this routine.
i.e. if the normal version minimizes:
objective = fvec^T * fvec

Then if I want to weight by W positive definite:
objective = fvec^T * W * fvec

couldn't I get the lower triangular cholesky: C * C^T = W for C lower triangular, then transform?:

new_fvec = C^T * fvec

And pass this into this routine?

Then I figured that to calculate the hessian, I would probably need to multiply it by the full weighting matrix for the math to be consistent with this transformation.
new_hessian = W * old_hessian.

I can check the algebra on my own now that I have the signs/values OK in my example of how to use the routine. Thanks for your help.

Jesse