- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have a fluid solver written in Fortran. I'm trying to optimise it.
In a "do loop" I have:
ue=(u(ine)*fxe+u(inp)*fxw)+
& dtime*denr*vole*arx*
& ((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
ve=(v(ine)*fxe+v(inp)*fxw)+
& dtime*denr*vole*ary*
& ((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
we=(w(ine)*fxe+w(inp)*fxw)+
& dtime*denr*vole*arz*
& ((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
we can see that
"dtime*denr*vole*((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)" is computed 3 times. So I decided to replace these lignes with:
real*8 val_tmp
val_tmp=dtime*denr*vole*((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
ue=(u(ine)*fxe+u(inp)*fxw)+
& arx*val_tmp
ve=(v(ine)*fxe+v(inp)*fxw)+
& ary*val_tmp
we=(w(ine)*fxe+w(inp)*fxw)+
& arz*val_tmp
Then I compare the output results in the first case with the ones in the second case. With O0, O1, O2 or O3 the results are slightly different. I don't understand why ?!?
Best regards,
Guillaume
We have a fluid solver written in Fortran. I'm trying to optimise it.
In a "do loop" I have:
ue=(u(ine)*fxe+u(inp)*fxw)+
& dtime*denr*vole*arx*
& ((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
ve=(v(ine)*fxe+v(inp)*fxw)+
& dtime*denr*vole*ary*
& ((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
we=(w(ine)*fxe+w(inp)*fxw)+
& dtime*denr*vole*arz*
& ((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
we can see that
"dtime*denr*vole*((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)" is computed 3 times. So I decided to replace these lignes with:
real*8 val_tmp
val_tmp=dtime*denr*vole*((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
ue=(u(ine)*fxe+u(inp)*fxw)+
& arx*val_tmp
ve=(v(ine)*fxe+v(inp)*fxw)+
& ary*val_tmp
we=(w(ine)*fxe+w(inp)*fxw)+
& arz*val_tmp
Then I compare the output results in the first case with the ones in the second case. With O0, O1, O2 or O3 the results are slightly different. I don't understand why ?!?
Best regards,
Guillaume
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Floating point arithmetic is not exact. There exists a small positive number \epsilon such for positive x < \epsilon that the floating point unit of the CPU is unable to distinguish 1 + x from 1. Read, for example,
http://introcs.cs.princeton.edu/91float/
and the references cited there.
Intel architecture IA32 and Intel64/AMD64 CPUs contain 80-bit x87 registers as well as 64-bit SSE2 registers. When the result of a calculation is stored is stored from an 80-bit register to 64-bit memory and reloaded later, the newly loaded value is slightly different from the original value in the register.
There are compiler options to control floating point behavior. There is no magical set of options to give you what you expect.
Sometimes, one needs to ask what "slightly different" means, since it is possible to print out numbers that only appear to be different as a result of conversion from the internal binary representation to decimal scientific notation.
http://introcs.cs.princeton.edu/91float/
and the references cited there.
Intel architecture IA32 and Intel64/AMD64 CPUs contain 80-bit x87 registers as well as 64-bit SSE2 registers. When the result of a calculation is stored is stored from an 80-bit register to 64-bit memory and reloaded later, the newly loaded value is slightly different from the original value in the register.
There are compiler options to control floating point behavior. There is no magical set of options to give you what you expect.
Sometimes, one needs to ask what "slightly different" means, since it is possible to print out numbers that only appear to be different as a result of conversion from the internal binary representation to decimal scientific notation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thx for your answer. So there is no simple solution. :(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compilers don't usually look for variations in evaluation order which will expose common sub-expressions. To do so could be time-consuming. If you would enclose the common expressions in parentheses (and, with ifort, set -assume protect_parens), you should get the same result which you found with your temporary specified.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page