- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am running some climate models and met an annoying problem. If the climate model was compiled wih debug mode -O0 (using intel/12.1.9.293 and openmpi 1.4.3), then the running result is totally different from that running model compiled with -O3 option.
Could someone please tell me if this is normal? Should a model compiled with debug mode be used in production runs?
Many thanks.
Cheers,
Lyndon.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lyndon
you should not consider to run your code compiled by -O0 in production mode. You need to find the cause of the difference. Very likely they are caused by inconsistencies of floating point operation. See http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ for a good introduction to this topic ( the PDF file you can download from thebottom of the page). This covers the computation on a single compute node only however. Parallelization by MPI (and OpenMP) might cause numerical differences too e.g. in case the order of operations for a reduction is changing due to the different optimization level. In case you use a recent Intel compiler(not 12.1 you currently use but 13.1, 14.0) , OpenMP reductions can be enforced to be deterministic by setting environment variable KMP_DETERMINISTIC_REDUCTION=yes
Thus as first steps:
1. Try to find a configuration not using OpenMP and MPI if possible. This will help then to exclude these parallel models as the cause
2. Use option "-O2 -fp-model precise" and check results.Both should not have a large impact on performance. Next you too might try "-fp-model strict" but this will slow down the code considerably. But it would tell you, that the FP operations are the cause of the numerical instability
3. Compile half of your soure flles by optimization -O2, the rest by -O0 and continue recursively until you found the file causing the difference
4. For the file you found in (3), copy half of the routines to a new file, compile one by -O2, one by -O0 and search for the critical routine similar to (3)
Once you found the routine, you might see what is causing the difference. Typically it is code related to a reduction operation where ve.g. vectorizaton changes the order of operations.
Heinz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How different are your results? In our in house CFD codes running with MPI always produce slightly different results, but still accurate to at worst ~1%. Depending on which nodes you get, how busy they are, etc. the order of MPI reductions can definitely impact the noise in your simulations. It's possible as well that changing the optimization levels is exposing an MPI/parallel programming bug, related to message passing that wasn't triggered in the other case. (race condition etc.)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page