- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a complicated fortran program to simulate EPR spectra. It is built into a DLL and used inside a LabVIEW program for nonlinear fitting.
I noticed some serious instabilities in the numerical partial derivatives calculations that seem to depend on the actual fitting parameter and the parameter increment in a chaotic manner.Tiniestchanges in inputs causes weirdfluctuationsin the result. Playing will all compile options, I noticed that the default was to generate SSE2 code. The same problem showed up when selecting SSE3.
Disabling all SIMD (option IA32) miraculously stabilized the computations and made things much more predictable.
The attached image shows my testbench LabVIEW program to analyse the issues in detail and compare DLLs side by side. There are no changes in the fortran code. The only difference here is /arch:SSE3 vs /arch:IA32 in the code generation option.Compiled with SSE3 (red) or IA32 (white).
To show problems with the partial derivative computations, I calculate
Sqrt(sum((f(x)-f(x+dx))/(x+dx))^2)
and graph it vs. dx from 1e0 to 1e-12.
For very small dx (left side), we get numerical errors because we divide by a very small number and for very large dx(right side), we get deviation because the linear approximation is no longer valid.
IA32: looks very nice. Partial derivatives are flat for good increments over eight orders of magitude(!!). In the flat region, the partial derivatives are independent of dx as they should be.
SSE3: Shows oscillations for dx<1e-1 and there is a weird jump between 1e-3 and 1e-4. Overall, things get bad many orders of magnitude earlier than the same code compiled with IA32. Basically there is no dx that gives a stable partial derivative and levenberg marquardt often gets stuck way before the minimum. tiniest changes in x or dx cause dramatic changes in the details of the shown curves. I'ts like trying to play golf in a field full of rabbit holes.
When compiled under IA32, levenberg-marquardt behaves perfectly!!!
So, is there an explanation??? Is it due to reordering? Do some operations get implicit padded arrays to fit the SIMD registers? What else could it be? Not only are the SSE3 results slightly different from the IA32 results, the SSE3 show serious artifacts over nearly the entire range. The artifacts vary strongly for even smallest changes is either x or dx.
(All test were done on a core2duo T7600).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider the evaluation of the derivative of tan(x) at x = 1, using forward differences. Here is the program:
[fortran]program tstderWith x87 code, the results are
implicit none
double precision x,dx,der,exder
integer i
x=1d0
exder=1d0/(cos(x))**2
dx=1d0/65536
do i=1,20
der=(tan(x+dx)-tan(x))/dx
write(*,10)i,dx,der,der/exder-1d0
dx=dx*0.5d0
end do
10 format(i3,3ES14.6)
end program tstder
[/fortran]
\delta f'_{approx} rel. error in f'
[bash] 1 9.765625E-04 3.430738E+00 1.523542E-03With SSE3, the results change to
2 4.882812E-04 3.428126E+00 7.611113E-04
3 2.441406E-04 3.426822E+00 3.803910E-04
4 1.220703E-04 3.426170E+00 1.901544E-04
5 6.103516E-05 3.425844E+00 9.506690E-05
6 3.051758E-05 3.425682E+00 4.753088E-05
7 1.525879E-05 3.425600E+00 2.376480E-05
8 7.629395E-06 3.425560E+00 1.188223E-05
9 3.814697E-06 3.425539E+00 5.941075E-06
10 1.907349E-06 3.425529E+00 2.970521E-06
11 9.536743E-07 3.425524E+00 1.485253E-06
12 4.768372E-07 3.425521E+00 7.425505E-07
13 2.384186E-07 3.425520E+00 3.711654E-07
14 1.192093E-07 3.425519E+00 1.857448E-07
15 5.960464E-08 3.425519E+00 9.276254E-08
16 2.980232E-08 3.425519E+00 4.491205E-08
17 1.490116E-08 3.425519E+00 2.316182E-08
18 7.450581E-09 3.425519E+00 5.761643E-09
19 3.725290E-09 3.425519E+00 -2.938447E-09
20 1.862645E-09 3.425519E+00 -2.033863E-08
[/bash]
[bash] 1 1.525879E-05 3.425600E+00 2.376480E-05From numerical analysis, we know that the optimum dx for evaluating one-sided differences is ~ \sqrt( \delta /f''(x)), where \delta is the error in the function evaluation. In this example f, f' and f'' are all O(1), so the optimum dx is O(\sqrt \epsilon_{machine}).
2 7.629395E-06 3.425560E+00 1.188223E-05
3 3.814697E-06 3.425539E+00 5.941075E-06
4 1.907349E-06 3.425529E+00 2.970521E-06
5 9.536743E-07 3.425524E+00 1.485253E-06
6 4.768372E-07 3.425521E+00 7.425505E-07
7 2.384186E-07 3.425520E+00 3.711654E-07
8 1.192093E-07 3.425519E+00 1.857448E-07
9 5.960464E-08 3.425519E+00 9.276254E-08
10 2.980232E-08 3.425519E+00 4.491205E-08
11 1.490116E-08 3.425519E+00 2.316182E-08
12 7.450581E-09 3.425519E+00 5.761643E-09
13 3.725290E-09 3.425519E+00 -2.938447E-09
14 1.862645E-09 3.425519E+00 -2.033863E-08
15 9.313226E-10 3.425519E+00 -2.033863E-08
16 4.656613E-10 3.425519E+00 -8.993935E-08
17 2.328306E-10 3.425519E+00 4.926209E-08
18 1.164153E-10 3.425518E+00 -2.291408E-07
19 5.820766E-11 3.425518E+00 -2.291408E-07
20 2.910383E-11 3.425514E+00 -1.342752E-06
[/bash]
Looking at the printouts above, and locating the line with the smallest relative error in f', we see that the results agree with the theory quite well, since \epsilon_{machine} ~ 10^{-16} for 64-bit IEEE (SSEx) and ~ 10^{-20} for 80-bit floating point (x87).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps, you have expressions whose dominant type is real*4. If so, assigning the value of the expression to a double precision variable should still count as a single-precision calculation. An example:
[fortran]double precision :: pi .. .. pi = 4*atan(1.0) .. ..[/fortran]
Single-precision floating point evaluations cause numerically evaluated derivatives to have relative errors in the range of 1E-4 to 1E-3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"It should be noted that the program overruns array boundaries in the parsing
routines. When stack and array bound checking is used in the compilation,
errors occur (?)"
And, indeed, it does. Not only are arrays overrun, but at least one variable was used without initialization. Such code might have worked correctly on some machine some day. Today, I would consider it buggy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(1) "rmi=0.25*(dx-dy)/ct" as the third executable line in file /NLSL.MOMD/matrll.f
(2) No implicit none in caldlm.f
While this does not mean that it is the cause of the issue you see, as mecej4 and others have commented one needs to be careful about numerical precision. There are some very good articles by Steve Lionel on this topic.
It is good that using /real:64 is working for you now. Just in case that you have to write new Fortran program, you may want to read the precision related articles.
Abhi

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page