- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Problem: DGELSS does not seem to work with large datasets.
I am using the following: MKL 10.2 with VS 2008, Managed C++.
The call from C++ to MKL is fine - tested it with small datasets (upto 100 rows and 5 columns) and verified the results (LLS coefficients) with Matlab and Excel Regression.
Attached is the dataset that I am having trouble with. It has 16383 rows and I get very strange coefficients from MKL.
The first 5 columns are the coefficient matrix and the last column is the y value. I am basically trying to fit the following equation: y = a+b*x + c*x^2 + d*x^3 + e*x^4.
I get drastically different coefficients when I use Excel Data Analysis Toolpak Regression feature.
Any guidance is greatly appreciated. Thanks in advance.
I am using the following: MKL 10.2 with VS 2008, Managed C++.
The call from C++ to MKL is fine - tested it with small datasets (upto 100 rows and 5 columns) and verified the results (LLS coefficients) with Matlab and Excel Regression.
Attached is the dataset that I am having trouble with. It has 16383 rows and I get very strange coefficients from MKL.
The first 5 columns are the coefficient matrix and the last column is the y value. I am basically trying to fit the following equation: y = a+b*x + c*x^2 + d*x^3 + e*x^4.
I get drastically different coefficients when I use Excel Data Analysis Toolpak Regression feature.
Any guidance is greatly appreciated. Thanks in advance.
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - vikrantca
Problem: DGELSS does not seem to work with large datasets.
I am using the following: MKL 10.2 with VS 2008, Managed C++.
The call from C++ to MKL is fine - tested it with small datasets (upto 100 rows and 5 columns) and verified the results (LLS coefficients) with Matlab and Excel Regression.
Attached is the dataset that I am having trouble with. It has 16383 rows and I get very strange coefficients from MKL.
The first 5 columns are the coefficient matrix and the last column is the y value. I am basically trying to fit the following equation: y = a+b*x + c*x^2 + d*x^3 + e*x^4.
I get drastically different coefficients when I use Excel Data Analysis Toolpak Regression feature.
Any guidance is greatly appreciated. Thanks in advance.
I am using the following: MKL 10.2 with VS 2008, Managed C++.
The call from C++ to MKL is fine - tested it with small datasets (upto 100 rows and 5 columns) and verified the results (LLS coefficients) with Matlab and Excel Regression.
Attached is the dataset that I am having trouble with. It has 16383 rows and I get very strange coefficients from MKL.
The first 5 columns are the coefficient matrix and the last column is the y value. I am basically trying to fit the following equation: y = a+b*x + c*x^2 + d*x^3 + e*x^4.
I get drastically different coefficients when I use Excel Data Analysis Toolpak Regression feature.
Any guidance is greatly appreciated. Thanks in advance.
Could you check how this code will works with C++?
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Gennady Fedorov (Intel)
Vikrant,
Could you check how this code will works with C++?
--Gennady
Could you check how this code will works with C++?
--Gennady
Below is my code:
Int32 SolveGELSS(Int32 inM, Int32 inN, array
{
int m = inM;
int n = inN;
int nrhs = 1;
int lda = m;
int ldb = Math::Max(m,n);
double rcond = -1;
int lwork = 2;
int rank =0;
int i=0;
double *a = new double[m*n];
this->ConvertToColMajor(A, m,n,a);
int info = 0;
double *work = new double[lwork];
// Use ldb here, since it is max(m,n);
double *s = new double[ldb];
// Copy b to local array
double *b = new double
for(i = 0; i < m; i++)
b = B;
// Workspace query
lwork = -1;
DGELSS(&m,&n, &nrhs, a,&lda, b, &ldb,s, &rcond, &rank, work, &lwork, &info);
lwork = (int)work[0];
delete[]work;
work = new double[lwork];
DGELSS(&m,&n, &nrhs, a,&lda, b, &ldb,s, &rcond, &rank, work, &lwork, &info);
if(info == 0)
{
for(i = 0; i < m; i++)
B = b;
}
// Memory clean up
}
Many thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're right, coefficients seem to be bogus, I get somewhat like
-2.492143e-17 6.400757e-18 -8.539396e-19 -6.115223e-18 2.546998e-17
This can be explained by tooimbalanced matrix: the values are different in order by more than machineprecision (~1.0e-16). The algorithm becomes highly unstable. But it can be easily worked around. I would recommend you balancing the matrix before applying DGELSS - for instance, in this case:
for( i = 0; i < M; i++ ) A[1] *= 1.0e-6;
for( i = 0; i < M; i++ ) A[2] *= 1.0e-11;
for( i = 0; i < M; i++ ) A[3] *= 1.0e-16;
for( i = 0; i < M; i++ ) A[4] *= 1.0e-21;
for( i = 0; i < M; i++ ) B *= 1.0e-6;
Then you would get results,which seem to be correct (after back scaling):
-4.103898e+04 4.109750e-02 -4.553319e-07 4.735998e-12 -1.904054e-18
Michael.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Works nicely. Many thanks Michael.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - vikrantca
Works nicely. Many thanks Michael.
Many thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - vikrantca
One naive question - On output from GELSS, the rows of matrix B (from n+1 to m) will also need to be inverse-scaled to get the correct residuals, right?
Many thanks.
Many thanks.
Yes, it should be scaled back. Here's the theory:
- initial system of equations is: A*x = b, you need solution to minimize r = || A*x - b || and residual r.
- you solve the system A*cA*x' = b*cb instead, where cA: n-by-n, cA = diag( cA1, ..., cAn );cb: 1-by-1 scalar
- the solution of the modifed system: x' = inv( cA )*x*cb, that is x = inv( cb )*x'*cA
- by substituting solution x' into modified system's residualwe get r' = || A*cA*x' - b*cb || = || A*x*cb - b*cb || = || A*x - b || * cb = r*cb, that is r = r'*inv( cb ), so
you collect sum-of-squares from DGELSS (that is,of modified system - r'), than scale it back with the coefficient you scaled b vector with.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To be more precise, you need to scale back not r', but r'^2 (square of r'), so apparently it should be scaled by inv( cb )^2, that is r^2 = r'^2 * inv( cb )^2

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page