- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to run a 168 hour wrf simulation (2 domains - nested) on rhel 7.4 cluster in intel 19u2 environment.
I had prepared Wrf's input files (wrfinput01,wrfinput02 and wrfbdy) using wps (+real.exe) compiled with intel compilers.
When i run the simulation with wrf compiled with debug settings - ./configure -D (select sm) ; ./compile em_real , the simulation works fine (was able to complete simulate 48 hours of simulation without any issue) but was very slow - so i terminated it.
Then i compiled the optimized version of wrf , using intel 2019 i faced several types of issues -
1. simulation hanging up - initially i used to compile wrf with hybrid setting (sm+dm) but with this - the simulation on same set (as mentioned above) of input got stuck after simulating 10 minutes.
after removing -DMPI2_THREAD_SUPPORT, -ip and downgrading O3 to O2, simulation ran for 24 hours and got hung up again. I tried 3 times to check if it was a fluke - but the simulation gets stuck at same timestep (after 24 hours simulation). Same simulation with 2018r3 gets terminated at 16th hour with segfault. Though now i am using dm (instead of sm+dm) setting for compilation of wrf.
2. Simulation segfault - as debug version was working fine, i decided to introduce optimization gradually with intel 2019, but here - simulation fails (segfault) after 2 minutes. With O2 i tried various combinations (-heap-arrays/-no-heap-arrays) but simulation fails to go through.
Then i tried 3 other compilers - MPICH, MVAPICH2,OPENMPI. Note that i used the input generated by intel's wps (_real.exe) for following -
MPICH2 - was slow but simulation completed >48 hours without any issue
MVAPICH2 - hung up
OPENMPI - succesfully completed 168 hours of simulation in 21 hours of walltime.
This is definitely not an issue with the input/simulation setup and in debug mode and with openmpi the simulation works fine. Since the issue varies depending on the compiler version and compilation flags used - i am not sure if debugger(gdb) would help me in this case.
Could you please advice me on the methodology which i need to follow to figure out the valid compilation flags for this issue.
here is the compilation line with which wrf.exe was generated in debug mode -
mpiifort -f90=ifort -o wrf.exe -ip -fp-model precise -w -ftz -align all -fno-alias -FR -convert big_endian -xHost -fp-model fast=2 -no-heap-arrays -no-prec-div -no-prec-sqrt -fno-common -xCORE-AVX2 -g -g -O0 -fno-inline -no-ip -g -traceback -ip -xHost -fp-model fast=2 -no-prec-div -no-prec-sqrt -ftz -align all -fno-alias -fno-common -xCORE-AVX2 -g wrf.o ../main/module_wrf_top.o libwrflib.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_intel19u2_debug/WRFV3/external/fftpack/fftpack5/libfftpack.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_intel19u2_debug/WRFV3/external/io_grib1/libio_grib1.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_intel19u2_debug/WRFV3/external/io_grib_share/libio_grib_share.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_intel19u2_debug/WRFV3/external/io_int/libwrfio_int.a -L/home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_intel19u2_debug/WRFV3/external/esmf_time_f90 -lesmf_time /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_intel19u2_debug/WRFV3/external/RSL_LITE/librsl_lite.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_intel19u2_debug/WRFV3/frame/module_internal_header_util.o /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_intel19u2_debug/WRFV3/frame/pack_utils.o -L/home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_intel19u2_debug/WRFV3/external/io_netcdf -lwrfio_nf -L/home/puneet/MyTempSoftwares/WRFV3.9.1.1_Deps_intelmpi2019u2//lib -lnetcdff -lnetcdf -L/home/puneet/MyTempSoftwares/WRFV3.9.1.1_Deps_intelmpi2019u2//lib -lhdf5_fortran -lhdf5 -lm -lz
here is the compilation line for openmpi -
time mpif90 -DMPI2_SUPPORT -o wrf.exe -fopenmp -O2 -ftree-vectorize -funroll-loops -w -ffree-form -ffree-line-length-none -fconvert=big-endian -frecord-marker=4 wrf.o ../main/module_wrf_top.o libwrflib.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_openmpi3.1.2_fullopt/WRFV3/external/fftpack/fftpack5/libfftpack.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_openmpi3.1.2_fullopt/WRFV3/external/io_grib1/libio_grib1.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_openmpi3.1.2_fullopt/WRFV3/external/io_grib_share/libio_grib_share.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_openmpi3.1.2_fullopt/WRFV3/external/io_int/libwrfio_int.a -L/home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_openmpi3.1.2_fullopt/WRFV3/external/esmf_time_f90 -lesmf_time /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_openmpi3.1.2_fullopt/WRFV3/external/RSL_LITE/librsl_lite.a /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_openmpi3.1.2_fullopt/WRFV3/frame/module_internal_header_util.o /home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_openmpi3.1.2_fullopt/WRFV3/frame/pack_utils.o -L/home/puneet/MySoftwares/UTILS/WRF/3.9.1.1_openmpi3.1.2_fullopt/WRFV3/external/io_netcdf -lwrfio_nf -L/home/puneet/MyTempSoftwares/WRFV3.9.1.1_Deps_gcc4.8.5//lib -lnetcdff -lnetcdf -L/home/puneet/MyTempSoftwares/WRFV3.9.1.1_Deps_gcc4.8.5//lib -lhdf5_fortran -lhdf5 -lm -lz
Please let me know if compilation logs / more information is required from my end.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Two questions: does your code do I/O which might be in conflict with the MPI parallelization? and did you try to use debug flags (all compile and runtime checks on like nan initialization, FPE trapping, bounds checking to see whether the hangup might be a programming error?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am using an open source software - http://www2.mmm.ucar.edu/wrf/users/download/get_sources.html#WRF-ARW. Yes the software performs IO i.e. after completing a 3 hours of simulation, a .nc/netcdf file is generated. and i am not sure if MPI parallelization gets affected while filw writing happens. here are the files generated during a simulation which got hanged up -
-rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 05:18 wrfout_d01_2016-03-10_00:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 05:18 wrfout_d02_2016-03-10_00:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 05:32 wrfout_d01_2016-03-10_03:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 05:32 wrfout_d02_2016-03-10_03:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 05:45 wrfout_d01_2016-03-10_06:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 05:45 wrfout_d02_2016-03-10_06:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 05:58 wrfout_d01_2016-03-10_09:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 05:58 wrfout_d02_2016-03-10_09:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 06:11 wrfout_d01_2016-03-10_12:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 06:11 wrfout_d02_2016-03-10_12:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 06:24 wrfout_d01_2016-03-10_15:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 06:25 wrfout_d02_2016-03-10_15:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 06:38 wrfout_d01_2016-03-10_18:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 06:38 wrfout_d02_2016-03-10_18:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 06:51 wrfout_d01_2016-03-10_21:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 06:51 wrfout_d02_2016-03-10_21:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 07:04 wrfout_d01_2016-03-11_00:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 07:05 wrfout_d02_2016-03-11_00:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 07:18 wrfout_d01_2016-03-11_03:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 07:18 wrfout_d02_2016-03-11_03:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 07:31 wrfout_d01_2016-03-11_06:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 07:32 wrfout_d02_2016-03-11_06:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 07:45 wrfout_d01_2016-03-11_09:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 07:45 wrfout_d02_2016-03-11_09:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 07:58 wrfout_d01_2016-03-11_12:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 07:59 wrfout_d02_2016-03-11_12:00:00.nc -rw-r--r-- 1 puneet internalusers 1504737500 Mar 19 08:12 wrfout_d01_2016-03-11_15:00:00.nc -rw-r--r-- 1 puneet internalusers 3405553220 Mar 19 08:12 wrfout_d02_2016-03-11_15:00:00.nc
I have only tried the debug flag mentioned in "configure.wrf_inteldebug_intelforum.txt" file. If it was programming error it should have behaved same (hangup/segfault) with openmpi and intel+ debug settings.
Though, Currently i am trying to gradually add better optimization flags in configure.wrf_inteldebug_intelforum.txt (example- replacing O0 with O2) .
also, i will try out your suggestion of adding additional flags with optimized setting (O2+-xHost) -
-fpe0 -check noarg_temp_created,bounds,format,output_conversion,pointers,uninit -ftrapuv -unroll0 -u
If there are any additional flags which can help me on this , please let me know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I installed parallel_studio_xe_2019_update4_cluster_edition on ubuntu. When I am installing WRF using intel compiler it is giving me the errror .
"To DISABLE large filesupport in NetCDF, set the environment variable WRFIO_NCD_NO_LARGE_FILE_SUPPORTto 1 and run configure again. Set to any other value to avoid this message.Testing for NetCDF, C and Fortran compilerOne of compilers testing failed!Please check your compiler."
and when I typed " dpkg --list | grep compiler" I got
jaipur:/opt/intel/parallel_studio_xe_2019.4.070/bin> dpkg --list | grep compiler
ii g++ 4:5.3.1-1ubuntu1 amd64 GNU C++ compiler
ii g++-5 5.4.0-6ubuntu1~16.04.11 amd64 GNU C++ compiler
ii g++-5-multilib 5.4.0-6ubuntu1~16.04.11 amd64 GNU C++ compiler (multilib support)
ii g++-multilib 4:5.3.1-1ubuntu1 amd64 GNU C++ compiler (multilib files)
ii gcc 4:5.3.1-1ubuntu1 amd64 GNU C compiler
ii gcc-5 5.4.0-6ubuntu1~16.04.11 amd64 GNU C compiler
ii gcc-5-multilib 5.4.0-6ubuntu1~16.04.11 amd64 GNU C compiler (multilib support)
ii gcc-multilib 4:5.3.1-1ubuntu1 amd64 GNU C compiler (multilib files)
ii gfortran 4:5.3.1-1ubuntu1 amd64 GNU Fortran 95 compiler
ii gfortran-5 5.4.0-6ubuntu1~16.04.11 amd64 GNU Fortran compiler
ii hardening-includes 2.7ubuntu2 all Makefile for enabling compiler flags for security hardening
ii libllvm3.8:amd64 1:3.8-2ubuntu4 amd64 Modular compiler and toolchain technologies, runtime library
ii libxkbcommon0:amd64 0.5.0-1ubuntu2.1 amd64 library interface to the XKB compiler - shared library
As I installed my intel compiler.
jaipur:/opt/intel/parallel_studio_xe_2019.4.070/bin> whereis icc
icc: /opt/intel/bin/icc /opt/intel/compilers_and_libraries_2019.4.243/linux/bin/intel64/icc /opt/intel/compilers_and_libraries_2019.4.243/linux/bin/intel64/icc.cfg
So I am not able to see the intel compiler in my list and how can I link my intel complier to WRF .
Please share me some experience how to insatll wrf using intel compliers not gfortan compilers.
wait for a positive reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note that icc is the name of the Intel C/C++ compiler driver. You may or may not have installed the Intel Fortran compiler (ifort) when you installed Parallel Studio, so check that first.
It would be far easier for you to build and run WRF using a supported compiler such as Gfortran. If the build system for WRF does not contain a configuration that uses the Intel compiler, chances are slim that you can build WRF using Intel Fortran. Nor is this forum a support forum for questions specific to third party packages such as WRF.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I installed intel parallel_studio_xe_2019_update4_cluster_edition.I am trying to install WRF(3.9.1.1) component in NU-WRF using intel compiles.
The following Errors are there:
-------------------------------------------------------------------------------------------------------------------------------
libwrflib.a(module_wrf_error.o): In function `wrf_message_':
module_wrf_error.f90:(.text+0x0): multiple definition of `wrf_message_'
libwrflib.a(noahmp36_wrf_routines.o):noahmp36_wrf_routines.F90:(.text+0x0): first defined here
libwrflib.a(module_wrf_error.o): In function `wrf_error_fatal_':
module_wrf_error.f90:(.text+0xa50): multiple definition of `wrf_error_fatal_'
libwrflib.a(noahmp36_wrf_routines.o):noahmp36_wrf_routines.F90:(.text+0x1e0): first defined here
libwrflib.a(module_wrf_error.o): In function `wrf_error_fatal3_':
module_wrf_error.f90:(.text+0xf60): multiple definition of `wrf_error_fatal3_'
libwrflib.a(noahmp36_wrf_routines.o):noahmp36_wrf_routines.F90:(.text+0x80): first defined here
real_em.o: In function `med_sidata_input_':
real_em.f90:(.text+0x167a): undefined reference to `module_wps_io_arw_mp_read_wps_'
real_em.o: In function `assemble_output_':
real_em.f90:(.text+0x2d1a): undefined reference to `module_big_step_utilities_em_mp_couple_'
real_em.f90:(.text+0x2f5b): undefined reference to `module_big_step_utilities_em_mp_couple_'
real_em.f90:(.text+0x319b): undefined reference to `module_big_step_utilities_em_mp_couple_'
real_em.f90:(.text+0x33d4): undefined reference to `module_big_step_utilities_em_mp_couple_'
real_em.f90:(.text+0x3c58): undefined reference to `module_big_step_utilities_em_mp_couple_'
---------------------------------------------------------------------------------------------------------------------------------------------------
configure.wrf :
DESCRIPTION = INTEL ($SFC/$SCC)
DMPARALLEL = 1
OMPCPP = # -D_OPENMP
OMP = # -openmp -fpp -auto
OMPCC = # -openmp -fpp -auto
SFC = ifort
SCC = icc
CCOMP = icc
DM_FC = mpif90 -f90=$(SFC)
DM_CC = mpicc -cc=$(SCC) -DMPI2_SUPPORT
FC = time $(DM_FC)
CC = $(DM_CC) -DFSEEKO64_OK
LD = $(FC)
RWORDSIZE = $(NATIVE_RWORDSIZE)
PROMOTION = -real-size `expr 8 \* $(RWORDSIZE)` -i4
ARCH_LOCAL = -DNONSTANDARD_SYSTEM_FUNC -DWRF_USE_CLM
CFLAGS_LOCAL = -w -O2 -ip #-xHost -fp-model fast=2 -no-prec-div -no-prec-sqrt -ftz -no-multibyte-chars
LDFLAGS_LOCAL = -ip #-xHost -fp-model fast=2 -no-prec-div -no-prec-sqrt -ftz -align all -fno-alias -fno-common
CPLUSPLUSLIB =
ESMF_LDFLAG = $(CPLUSPLUSLIB)
FCOPTIM = -O2
FCREDUCEDOPT = $(FCOPTIM)
FCNOOPT = -O0 -fno-inline -no-ip
FCDEBUG = # -g $(FCNOOPT) -traceback # -fpe0 -check noarg_temp_created,bounds,format,output_conversion,pointers,uninit -ftrapuv -unroll0 -u
FORMAT_FIXED = -FI
FORMAT_FREE = -FR
FCSUFFIX =
BYTESWAPIO = -convert big_endian
RECORDLENGTH = -assume byterecl
FCBASEOPTS_NO_G = -ip -fp-model precise -w -ftz -align all -fno-alias $(FORMAT_FREE) $(BYTESWAPIO) #-xHost -fp-model fast=2 -no-heap-arrays -no-prec-div -no-prec-sqrt -fno-common
FCBASEOPTS = $(FCBASEOPTS_NO_G) $(FCDEBUG)
MODULE_SRCH_FLAG =
TRADFLAG = -traditional-cpp
CPP = /lib/cpp -P -nostdinc
AR = ar
ARFLAGS = ru
M4 = m4
RANLIB = ranlib
RLFLAGS =
CC_TOOLS = $(SCC)
I dont know where it is going wrong .Kindly help me to solve this issue
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page