Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP version of WRF 3.6 core dump

An_Yang
Beginner
783 Views

Hi all,

Both single OpenMP and MPI OpenMP version of WRF3.6 core dump, but serial and MPI version works fine.

Do you thinks it's a bug in openmp?

My env:

CentOS6.5 with full updates, 64bits

Intel parallel_studio_xe_2013_sp1_update3

WRF3.6 compiled with icc

OpenMPI 1.6.5 compiled with icc

NetCDF 4.1.3 compiled with icc

test case: conus12km_data_v3

Hareware: Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz, 8GB RAM

0 Kudos
4 Replies
TimP
Honored Contributor III
783 Views

 Guessing that you may mean a segv abort, such as a stack overflow, which of the standard remedies did you investigate?

https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors

http://www.open-mpi.org/community/lists/users/

I note that Intel instructions, e.g.

https://software.intel.com/en-us/articles/wrf-and-wps-v311-installation-bkm-with-inter-compilers-and-intelr-mpi

imply using Intel MPI with default stack limits for both shell stack (e.g. ulimit -s) and thread stack (OMP_STACKSIZE),

but you may need to reconsider, particularly in view of your use of openmpi

I've heard suggestions from influential people that default stack limits ought to be increased, but nothing has changed, to my knowledge.

Remember that OMP_STACKSIZE defaults to 4MB with Intel 64-bit compilers.  If you double that, the shell stack size requirement increases by 4MB times number of threads.   So you would not arbitrarily use more threads than you would expect to be useful. 

MPI ranks don't inherit shell stack size from the shell where you launch MPI, so you may need to launch a script under MPI which sets stack size, even though you will find expert recommendations to the contrary.

0 Kudos
An_Yang
Beginner
783 Views

(gdb) bt
#0  0x00000000017e41c3 in module_diffusion_em_mp_compute_diff_metrics_ ()
#1  0x000000000104df15 in module_first_rk_step_part2_mp_first_rk_step_part2_ ()
#2  0x00002b5020b1ded3 in L_kmp_invoke_pass_parms () from /opt/intel/composer_xe_2013_sp1.3.174/compiler/lib/intel64/libiomp5.so
#3  0x00007ffedaf13d40 in ?? ()
#4  0x00007ffedaf13d48 in ?? ()
#5  0x00007ffedaf13d50 in ?? ()
#6  0x00007ffedaf13d58 in ?? ()
#7  0x00007ffedaf13d60 in ?? ()
#8  0x00007ffedaf13d68 in ?? ()
#9  0x00007ffedaf13d70 in ?? ()
#10 0x00007ffedaf13d78 in ?? ()
#11 0x00007ffedaf13d80 in ?? ()
#12 0x00007ffedaf13d88 in ?? ()
#13 0x00007ffedaf13e80 in ?? ()
#14 0x00007ffedaf13e88 in ?? ()

0 Kudos
An_Yang
Beginner
783 Views

Thanks Tim,

OMP_STACKSIZE=8M works.
 

0 Kudos
An_Yang
Beginner
783 Views

MPI version of WRF:

grep 'Timing for main' rsl.error.0000 | tail -150 | awk '{print $9}' | awk -f stats.awk
---
    items:       150
      max:         9.808480
      min:         4.432730
      sum:       758.022770
     mean:         5.053485
 mean/max:         0.515216

OpenMP version of WRF:

grep 'Timing for main' openmp.log | tail -150 | awk '{print $9}' | awk -f stats.awk
---
    items:       150
      max:         8.808800
      min:         4.216570
      sum:       724.733490
     mean:         4.831557
 mean/max:         0.548492

724.733490÷758.022770=95.6%,openmp version is a little bit faster than MPI version.

0 Kudos
Reply