- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
Sorry if this is a basic question, but I was very confused reading the documents and mapping to nonlinear curve fitting.
So if I implemented the RCI_Request == 1 message in the solve loop, I believe I just fill in fvec of size m with:
y_i - f(z_i, beta_current) for whatever "x"(i.e. beta_current) I am passed.
Is that correct?
Where i am getting confused is in the calculation of the jacobian for my problem. I will have an analytical derivative of my f with respect to both the z_i's (there would be p of them) and the beta (there are n of them). So my instinct on the jacobian is that it is an p by n instead of an m x n? So I am not sure what the jacobian really is for my example.
To give a concrete example, lets say that I was fitting y = beta1 * z1 + beta_2 * z2 to a bunch of y_i, z1_i, z2_i data. Say there are 3 data points i = 1, 2, 3.
I assume fvec(beta1, beta2) is the following vector....:
y_1 - beta1 * z1_1 - beta2 * z2_1
y_2 - beta1 * z1_2 - beta2 * z2_2
...
y_5 - beta1 * z1_5 - beta2 * z2_5
And what would I fill in for the jacobian? Would it be:
[-z1_1 -z2_1]
[-z1_2 -z2_2]
....
[-z1_5 -z2_5]
i.e the derivative with respective to each beta, evaluated at the approporate point for the i_th row?
Thanks for your help, and sorry that this is a basic question.
Sorry if this is a basic question, but I was very confused reading the documents and mapping to nonlinear curve fitting.
- I have a model: y_i = f(z_i; beta) + noise where beta is the set of parameters I want to estimate. Say that beta \in R^n and z_i in R^p
- I have data for y_i and z_i. Say that y_i has m observations
- I want the beta that solves the least squares problem.
So if I implemented the RCI_Request == 1 message in the solve loop, I believe I just fill in fvec of size m with:
y_i - f(z_i, beta_current) for whatever "x"(i.e. beta_current) I am passed.
Is that correct?
Where i am getting confused is in the calculation of the jacobian for my problem. I will have an analytical derivative of my f with respect to both the z_i's (there would be p of them) and the beta (there are n of them). So my instinct on the jacobian is that it is an p by n instead of an m x n? So I am not sure what the jacobian really is for my example.
To give a concrete example, lets say that I was fitting y = beta1 * z1 + beta_2 * z2 to a bunch of y_i, z1_i, z2_i data. Say there are 3 data points i = 1, 2, 3.
I assume fvec(beta1, beta2) is the following vector....:
y_1 - beta1 * z1_1 - beta2 * z2_1
y_2 - beta1 * z1_2 - beta2 * z2_2
...
y_5 - beta1 * z1_5 - beta2 * z2_5
And what would I fill in for the jacobian? Would it be:
[-z1_1 -z2_1]
[-z1_2 -z2_2]
....
[-z1_5 -z2_5]
i.e the derivative with respective to each beta, evaluated at the approporate point for the i_th row?
Thanks for your help, and sorry that this is a basic question.
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, one more addition: I am also interested in the best way to do a weighted nonlinear least squares? If I am minimizing the sum of squares weighted by a positive definite matrix W. ie.
min R(x)^T W R(x)
Is the best bet to do a cholesky of W then multiple the calculateed fvec and jacobian by this? in the solve loop?
Thanks,
Jesse
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jlperla
Now, from what I can gather in the docs, my beta is the x_i in the docs.
So if I implemented the RCI_Request == 1 message in the solve loop, I believe I just fill in fvec of size m with:
y_i - f(z_i, beta_current) for whatever "x"(i.e. beta_current) I am passed.
Is that correct?
Where i am getting confused is in the calculation of the jacobian for my problem. I will have an analytical derivative of my f with respect to both the z_i's (there would be p of them) and the beta (there are n of them). So my instinct on the jacobian is that it is an p by n instead of an m x n? So I am not sure what the jacobian really is for my example.
To give a concrete example, lets say that I was fitting y = beta1 * z1 + beta_2 * z2 to a bunch of y_i, z1_i, z2_i data. Say there are 3 data points i = 1, 2, 3.
I assume fvec(beta1, beta2) is the following vector....:
y_1 - beta1 * z1_1 - beta2 * z2_1
y_2 - beta1 * z1_2 - beta2 * z2_2
...
y_5 - beta1 * z1_5 - beta2 * z2_5
And what would I fill in for the jacobian? Would it be:
[-z1_1 -z2_1]
[-z1_2 -z2_2]
....
[-z1_5 -z2_5]
i.e the derivative with respective to each beta, evaluated at the approporate point for the i_th row?
So if I implemented the RCI_Request == 1 message in the solve loop, I believe I just fill in fvec of size m with:
y_i - f(z_i, beta_current) for whatever "x"(i.e. beta_current) I am passed.
Is that correct?
Where i am getting confused is in the calculation of the jacobian for my problem. I will have an analytical derivative of my f with respect to both the z_i's (there would be p of them) and the beta (there are n of them). So my instinct on the jacobian is that it is an p by n instead of an m x n? So I am not sure what the jacobian really is for my example.
To give a concrete example, lets say that I was fitting y = beta1 * z1 + beta_2 * z2 to a bunch of y_i, z1_i, z2_i data. Say there are 3 data points i = 1, 2, 3.
I assume fvec(beta1, beta2) is the following vector....:
y_1 - beta1 * z1_1 - beta2 * z2_1
y_2 - beta1 * z1_2 - beta2 * z2_2
...
y_5 - beta1 * z1_5 - beta2 * z2_5
And what would I fill in for the jacobian? Would it be:
[-z1_1 -z2_1]
[-z1_2 -z2_2]
....
[-z1_5 -z2_5]
i.e the derivative with respective to each beta, evaluated at the approporate point for the i_th row?
(1) fvec -- correct. It's residual vector
(2) Jacobian - the dimensions are (always) number of equations (usually denoted by m) by number of uknowns (estimated parameters, usually denoted by n). Usually m >> n. So if in your case both beta and z_i are to be estimated, the dimensions would be m x (n+p). From your description looks as if the beta was unknown while z_i not, so it would be: (number of data points i) x n, and thus, your example looks OK.
A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jlperla
Sorry, one more addition: I am also interested in the best way to do a weighted nonlinear least squares? If I am minimizing the sum of squares weighted by a positive definite matrix W. ie.
min R(x)^T W R(x)
Is the best bet to do a cholesky of W then multiple the calculateed fvec and jacobian by this? in the solve loop?
Thanks,
Jesse
Jesse,
the weight matrix is (most frequently) diagonal (unless you have correlated data). So inverse of diagonal matrix is the equal to inverse of its diagonal elements and you don't need anything special to get it. The weight matrix simply scales your data.
A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - ArturGuzik
(1) fvec -- correct. It's residual vector
(2) Jacobian - the dimensions are (always) number of equations (usually denoted by m) by number of uknowns (estimated parameters, usually denoted by n). Usually m >> n. So if in your case both beta and z_i are to be estimated, the dimensions would be m x (n+p). From your description looks as if the beta was unknown while z_i not, so it would be: (number of data points i) x n, and thus, your example looks OK.
A.
So it sounds like I have the signs/values right on the example Jacobian here? (that is one of the main concerns I had).
As for the weighting, I want to use this for GMM (http://en.wikipedia.org/wiki/Generalized_method_of_moments ) If I need it, is my idea of multiplying the fvec by the (upper?) cholesky the right approach? And for the jacobian, do I multiply it by the full weighting matrix instead of just it's cholesky?
Thanks for all of your help,
Jesse
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jlperla
So it sounds like I have the signs/values right on the example Jacobian here? (that is one of the main concerns I had).
As for the weighting, I want to use this for GMM (http://en.wikipedia.org/wiki/Generalized_method_of_moments ) If I need it, is my idea of multiplying the fvec by the (upper?) cholesky the right approach? And for the jacobian, do I multiply it by the full weighting matrix instead of just it's cholesky?
Thanks for all of your help,
Jesse
(2) I must admit I don't know (GMM) and understand (why you want a Cholesky decomposed matrix (which one? J^T J) be multiplied by fvec (residual vector) - you mean solution of WLS?). Please elaborate.
Usually weighted LS are used just to eliminate or adjust for different uncertainty in (observation) values. And the most common is to have that matrix diagonal otherwise they are correlated.
A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - ArturGuzik
(1) signs/values looks OK
(2) I must admit I don't know (GMM) and understand (why you want a Cholesky decomposed matrix (which one? J^T J) be multiplied by fvec (residual vector) - you mean solution of WLS?). Please elaborate.
Usually weighted LS are used just to eliminate or adjust for different uncertainty in (observation) values. And the most common is to have that matrix diagonal otherwise they are correlated.
A.
(2) I must admit I don't know (GMM) and understand (why you want a Cholesky decomposed matrix (which one? J^T J) be multiplied by fvec (residual vector) - you mean solution of WLS?). Please elaborate.
Usually weighted LS are used just to eliminate or adjust for different uncertainty in (observation) values. And the most common is to have that matrix diagonal otherwise they are correlated.
A.
I think that if I have the signs/values OK then I understand which function the derivative is of. Typically, you need a positive weighting matrix in the middle of the optimizer objective, and since the fvec residual has been split, I figured you could take the cholesky of the weighting matrix to use this routine.
i.e. if the normal version minimizes:
objective = fvec^T * fvec
Then if I want to weight by W positive definite:
objective = fvec^T * W * fvec
couldn't I get the lower triangular cholesky: C * C^T = W for C lower triangular, then transform?:
new_fvec = C^T * fvec
And pass this into this routine?
Then I figured that to calculate the hessian, I would probably need to multiply it by the full weighting matrix for the math to be consistent with this transformation.
new_hessian = W * old_hessian.
I can check the algebra on my own now that I have the signs/values OK in my example of how to use the routine. Thanks for your help.
Jesse
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page