Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Polynomial line fitting

JohnNichols
Valued Contributor III
2,219 Views

has anyone tried FITPACK with Windows?

11 Replies
Arjen_Markus
Honored Contributor II
2,133 Views

What is the problem exactly? I did find the source code via Netlib (Google was no help here, as it has very different idea about what "fit" means).

0 Kudos
mecej4
Honored Contributor III
2,098 Views

There are two packages called "Fitpack" at Netlib: one by Alan Cline and the other by Paul Dierckx. The word "fitting" may imply that the resulting curve/expression should interpolate the provided points exactly, or that the curve should "fit" the data in a least-square or some similar sense, i.e., that the data points should be close to the curve. An interpolating curve may have unwelcome wiggles, and a smooth fit tries to avoid these wiggles. Sometimes the curve is subjected to "tension", as in a taut string.

0 Kudos
andrew_4619
Honored Contributor III
2,062 Views
I have used some of the Dierckx routines but I did some mild refactoring to make it 2018 standards compliment. If I recall I was fitting spline curves.
0 Kudos
Arjen_Markus
Honored Contributor II
2,033 Views

I found only one FITPACK on netlib, thanks for the second link. The first few experiments with Cline's package look fine, though some results were puzzling. Anyway, I will continue experimenting, as I am interested in the general problem these packages solve. (My experiments involve creating an object-oriented interface for them and then see whether the splines thus produced give understandable answers)

0 Kudos
JohnNichols
Valued Contributor III
1,950 Views

The accelerometers, tilt meters and GPS data that one collects, even if only at 8 second intervals or 2000 times per second, starts to create huge data series. 

The first problem with these series is the thermal impact, this is not something one can ignore. 

So a sample from a tilt meter:

 

JohnNichols_0-1716413558368.png

Clearly there are two influences, a loss line and then thermal, the loss line is measured over decades and the temperature over a day, so temp dominates for a little while, but then the loss line kicks in. 

 

We can do a linear regression in many ways, Fortran, EXCEL and C# as three samples. EXCEL gives the best graphs but it takes a lot of time, useful as you work out techniques.  C# gives better graphs than Fortran for the same work, but Fortran is faster.  C# does not handle non-linear easily.  If you are doing a lot use Fortran or C#, personal choice. 

If we do the linear regression in EXCEL we get 

JohnNichols_1-1716413777769.png

 

 CoefficientsStandard Errort StatP-value
Intercept4.5790773530.00040011711444.352580
X Variable 1-1.73446E-078.35188E-10-207.67289920

 

The P value is not really zero, but such high t-STATS test the ability of real numbers to complete the analysis, anything over 2 in t-stat is the regression is solid and 200 is there is no argument. 

The -1.7e-7 is not important till you realize there are 10000 steps per day,  3.65 million in a year and 365 million in a century, a bridge should last a century and we do not want the piers to tilt that far. 

 

a residual plot and the trend is mostly gone and we get 

The secondary analysis is to look at the residuals and here one can use FFT, but the answer is trivial at one day cycle and the better method is to use polynomial or such regression,  but here I use the linear to take out the temperature dependence. 

JohnNichols_2-1716414292819.png

 

 

 CoefficientsStandard Errort StatP-value
Intercept4.5287863810.00011541539239.230
X Variable 1-0.0016236345.61804E-06-289.0040

 

and the residual plot is 

 

JohnNichols_3-1716414523795.png

Now I am stuck, the only real solution is Fortran - maybe C++ if you are a masochist and we are now looking at lots of records, where the difference is the measured acceleration - thermal, traffic or construction.  If we tease apart the data we will find the polynomial elements that relate to traffic and construction and the base one is the one left over - minor thermal.  In this case because we did a few months without traffic we know the lowest curve is no loads other than natural, bit of wind, river force etc.. 

So we use Fortran to subset the data into nice groups and then apply polynomial type regression, ie I found  FITPACK and now I have to see if it will work. 

One tests out the methods with EXCEL and then you code them. 

The interesting issue is the use of the Central Limit Theorem on very large data sets. 

 

 

 

 

 

 

 

 

 

 

 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,884 Views

Can you make two new charts, TiltX and X Variable 1 Residual, but this time, plot the 1st 20000 points

*** and replace the "large" dot and diamond with a 1 pixel wide line.

This might help to visualize the activity.

 

The 3rd and 4th charts are somewhat interesting. Can you explain them a little bit.

 

The data presented is similar except the slopes are switched (3rd chart has a somewhat negative slope and the 4th chart has a somewhat positive slope).

 

Both charts appear to have four major harmonics causing four major aggregations (point densities about lines).

While charts 1 and 2 could have a single curve fitted (somewhat that of abs(sin(X)))

Charts 3 and 4 might be best served with plotting multiple lines. As to how to do this, it may require some creativity.

 A guess at what to do would (iteratively)

a) filter out what appears to be stragglers from potential lines (density clusters)

b) Check for potential to curve fit multiple lines

c) if fail, tighten filter, go to a)

 

First pass might filter out these:

jimdempseyatthecove_0-1716484137872.png

 

Somewhat like a "Game of Life" where a point can die, but points cannot propagate.

 

Jim

0 Kudos
Arjen_Markus
Honored Contributor II
1,794 Views

Such datasets are always intriguing :), but I cannot give any advice on this particular one. What I did want to mention is that I continued working on a more modern interface to Cline's FITPACK and did some more experimenting, in particular with the smoothing options. The package allows you to calculate a smoothing spline "object" where one of the parameters is the weight, roughly the to be assigned to the data points. In this particular package that weight is roughly the standard deviation (according to the comments). Using a "large" value makes the curve much smoother and less inclined to the following the data points.

Just some observations.

0 Kudos
JohnNichols
Valued Contributor III
1,730 Views

JohnNichols_0-1716563961687.png

Jim:  This is the first 20000 points as you asked. I have used the thinnest of lines.  Each point represents "average" data for an 8.192 second interval which is based on 16384 FFT at 2000 Hz. 

The vertical axis is tilt in degrees.  

The object is an old structure made of steel with a concrete substructure. The tilt meter sits on top of the substructure 80 feet in the air. 

The tilt meter reads at 200 Hz.  

There are three things that can disturb the structure, thermal - traffic and construction work adjacent.   If you record continuously then you have the advantage that some periods are just thermal and some traffic and thermal etc... 

Whenever I do this, I am asked either what is the impact of traffic or construction.  The simple answer is to code it properly and then just get all the answers and put them in a MySQL database in the cloud.  

With this structure at the first meeting, a comment was made that thermal did not impact the response.  

 

JohnNichols_1-1716564459674.png

The correlation is obvious to even an old human brain.  

 

If I plot the temperature against the tilt I get - and this shows several lines through the points 

JohnNichols_2-1716564746877.png

Question is do they exist or are they random - turn on lines

JohnNichols_3-1716564834560.png

So the result shows the tilt is related to temp + some other factors.  Now we look for the other factors. 

 

 

 

JohnNichols_4-1716565021930.png

 

This is traffic - thermal and construction, now it is just set theory. 

Arjen:  I live inside these data sets with clients asking - what does it mean.  

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,585 Views

Looking at the temperature verses tilt looks a hysteresis chart. I do not think they are random.

Rather, one is when the temperature is ascending, and the other is when the temperature is descending.

I am not sure about the third line in between the top and bottom. That line seems to be somewhat stable (i.e. not noisy).

The vertical blip at ~22.5 may be a wind gust or something heavy moving into/out of the structure (or something rotating on the structure).

Also the "gap" between 20 and 21.5 is in the top line interesting, something happened at that time.

Does that correlate with some event related to the structure?

 

Jim

0 Kudos
JohnNichols
Valued Contributor III
1,573 Views

Jim:

Thanks for the message.   I want to send you a private message, but I am snookered as to how to do it, the new dashboard does not appear to have a way to create messages. 

In essence the trucks driving across the bridge change the mass of the steel bridge and impact on the tilt.  

Any idea how to create a private message would be appreciated?

John

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,563 Views

We'd connected by email before.

I emailed you a message a moment ago.

Jim

 

0 Kudos
Reply