Could this code snippet cause an error?

intel1 · ‎04-08-2009

I've got some bug in my code (an N-body simulation) that suddenly makes numbers turn to NaN values for (currently) no observable reason. Since it doesn't make the code crash, I'm having to find a safe save point from which I know it will fail in a few iterations and then use things like print statements to tell me exactly what's going on with the data.

I've just got to the part where I'm putting print statements in but in my process the bug might look like its gone away by doing a slight modification. I don't see why this would be so and chances are that it hasn't at all got rid of the bug but just made it appear elsewhere.

If one of you could say whether the following code could possibly cause such a bug and if the modification I've done would get rid of the bug then it could save me many hours in computational time trying to find another 'safe start point'.

Here's what I had in the code before:

v(1,j) = v(1,j) + (vchange(1) * h0)

v(2,j) = v(2,j) + (vchange(2) * h0)

v(3,j) = v(3,j) + (vchange(3) * h0)

Which I've since changed to:

v1_new = v(1,j) + (vchange(1) * h0)

v2_new = v(2,j) + (vchange(2) * h0)

v3_new = v(3,j) + (vchange(3) * h0)

v(1,j) = v(1,j) + v1_new

v(2,j) = v(2,j) + v2_new

v(3,j) = v(3,j) + v3_new

So that I can write out a before and after value for v. I'm sure my original snippet is perfectly fine but then I have no idea about the specifics of memory access in Fortran or by the iFort compiler.

Thanks,

Phil

EDIT: Nevermind, it still turns into NaN values, it just takes a few time steps longer to do so. I don't feel like I'm ever going to find this damned bug. :(

Lorri_M_Intel · ‎04-08-2009

My experience is that any piece of code can be modified to cause problems :-)

What tricks have you tried so far?
Have you tried the -check bounds compiler switch?
Does the code fail both with and without optimizations enabled? If so, does the optimization level make a different? (O1 vs O2 vs O3)

This is a start anyway -

- Lorri

jimdempseyatthecove · ‎04-08-2009

Typically NaN's are caused by GIGO (garbage in, garbage out)

Rework the following code to perform sanity checks

[cpp]    JFP_CLASS = FP_CLASS(BUDT(i))
  SELECT CASE (JFP_CLASS)
    CASE (0)
    call DOSTOP('SetBUDT FOR_K_FP_SNAN')
    CASE(1)
    call DOSTOP('SetBUDT FOR_K_FP_QNAN')
    CASE(2)
    call DOSTOP('SetBUDT FOR_K_FP_POS_INF')
    CASE(3)
    call DOSTOP('SetBUDT FOR_K_FP_NEG_INF')
    CASE(4)
	! OK - FOR_K_FP_POS_NORM
    CASE(5)
	! OK - FOR_K_FP_NEG_NORM
	CASE(6)
    call DOSTOP('SetBUDT FOR_K_FP_POS_DENORM')
    CASE(7)
    call DOSTOP('SetBUDT FOR_K_FP_NEG_DENORM')
    CASE(8)
	! OK - FOR_K_FP_POS_ZERO
    CASE(9)
	! OK - FOR_K_FP_NEG_ZERO
    CASE DEFAULT
    call DOSTOP('SetBUDT FOR_K_FP_unknown')
  END SELECT    
[/cpp]

Jim Dempsey

Izaak_Beekman · ‎04-09-2009

Is the N-body integrator a new one which you wrote or one which was previously working and broke? Have you tested a simple two body problem and confirmed that your integrator conserves angular momentum and energy? Obviously you need to ensure your choice of numerical method is stable, consistent, convergent, etc. Also, I beleive you need to be carefull about choosing appropriate and physical, initial conditions for the nbody system. From what I remember from my scientific computing in astro-physics course it seems that if binary systems develope within the n-body simulation this can also cause big problems since the time scale of the binary system will be considerably less than the rest of your system, forcing down your step size. Therefore it is entirely possible that there is some pathology in your initial conditions. Also, if there is a 'bug' in your code i would venture to say it is in the computation of derivatives, and could either be some sort of typo/unintentional logic error, or as I said above, perhaps the numerical method you are using is inapropriate. i.e. use a symetric, leapfrog type integrator rather than an euler type method.

I would recomend using a debugger such as DDD (open source and free/easily acquired on linux systems) or totalview which lets you visualize your data in x-y or 2-D surface plots. This lets you skip adding spurious print statements. (idb may have this feature in the new gui version, I'm not sure).

Sorry if if you already know all this stuff, but you didn't give us any backround on numerics etc. and it seems that this is very likely a numerics type of problem which may be better solved by consulting experts in numerics or computatiobnal astrophysics, rather than the ifort message boards.

Also, a quick note: You do realize that your update rule for V has been changed from V(i,j) = V(i,j) + vchange(i)*h0 to V(i,j) = V(i,j) + V(i,j) + vchange(i)*h0, right?

"Here's what I had in the code before:

v(1,j) = v(1,j) + (vchange(1) * h0)
v(2,j) = v(2,j) + (vchange(2) * h0)
v(3,j) = v(3,j) + (vchange(3) * h0)

Which I've since changed to:

v1_new = v(1,j) + (vchange(1) * h0)
v2_new = v(2,j) + (vchange(2) * h0)
v3_new = v(3,j) + (vchange(3) * h0)
v(1,j) = v(1,j) + v1_new
v(2,j) = v(2,j) + v2_new
v(3,j) = v(3,j) + v3_new"