Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2164 Discussions

Ubuntu 20.04 WRF CHEM segmentation fault

jsrikish
Beginner
2,020 Views

I was able to compile  wrf 4.2  using 
oneapi/compiler/2023.1.0 suite of compilers

./configure (I chose option 15)   15. (dmpar)

export DM_FC="mpiifort -f90=ifort"
export DM_CC="mpiicc -cc=icc -DMPI2_SUPPORT"

export FLEX_LIB_DIR=/usr/lib/x86_64-linux-gnu/
export EM_CORE=1
export WRF_CHEM=1
export WRF_KPP=1
export YACC="/usr/bin/bison -d"

export NETCDF=/home/apps
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH=/home/apps/lib

ulimit -s unlimited

I made a successful simulation on a single node using 38 processors. (total of 40 processors on each node) -- mpirun -np 38 ./wrf.exe  after logging into the node

machines1 has only one node listed (node12)
From the head node, I issue the following command

mpirun -np 38 -f machines1 ./wrf.exe 

2-way login set up between the head and compute nodes without prompting for password

It fails right after it starts

d01 2023-12-04_06:00:00 Input data is acceptable to use: wrfbdy_d01
Timing for processing lateral boundary for domain 1: 0.82989 elapsed seconds
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread-2.31.s 00007F7CAA465420 Unknown Unknown Unknown
wrf.exe 0000000001FE246A Unknown Unknown Unknown
wrf.exe 0000000001DB4198 Unknown Unknown Unknown
wrf.exe 000000000060075F Unknown Unknown Unknown
wrf.exe 0000000000418361 Unknown Unknown Unknown
wrf.exe 000000000041831F Unknown Unknown Unknown
wrf.exe 00000000004182BD Unknown Unknown Unknown
libc-2.31.so 00007F7CAA27B083 __libc_start_main Unknown Unknown
wrf.exe 00000000004181DE Unknown Unknown Unknown

Setting OMP_STACKSIZE 10240000000 didn't help.  Do I need any other MPI env variable  to be set/tweaked for running across nodes?


which mpirun

/ssd/intel/oneapi/mpi/2021.10.0//bin/mpirun

Also, tried with 2021.9.0

Could you please let us know how to run the code across nodes?
Thank you!

Labels (3)
0 Kudos
10 Replies
ShivaniK_Intel
Moderator
1,981 Views

Hi,


Thanks for posting in the Intel forums.


Could you please provide us with the sample reproducer and steps to reproduce the issue at our end?


Could you also please provide us with the processor details by running the below command.


Command: lscpu


Thanks & Regards

Shivani


0 Kudos
jsrikish
Beginner
1,969 Views

Thank you for your prompt reply.

 

Do you need the executable wrf.exe along with all the input files and namelist.input to run at your end?

I will send you the output lscpu.

 

Cheers

0 Kudos
ShivaniK_Intel
Moderator
1,940 Views

Hi,


>>>"Do you need the executable wrf.exe along with all the input files and namelist.input to run at your end?"


Yes, we need a sample reproducer along with the required input files and steps to reproduce the issue at our end.


Thanks & Regards

Shivani


0 Kudos
jsrikish
Beginner
1,895 Views

Executable and binary input files are large.  Where do I upload them?  

0 Kudos
ShivaniK_Intel
Moderator
1,882 Views

Hi,


We have internally shared the details to upload the executable and input files. Please share the files through that link.


Thanks & Regards

Shivani


0 Kudos
jsrikish
Beginner
1,796 Views

Thanks, Shivani.

 

I will upload the files tomorrow

0 Kudos
jsrikish
Beginner
1,771 Views

I uploaded wrf_fdda, wrf_biochem, wrfbdy files But when I tried to upload wrfinput_d01 & wrfinput_d02, it gave me an error message

and kicked me out.  Tried several times.

Secure Connection Failed

An error occurred during a connection to secureftp.intel.com. PR_CONNECT_RESET_ERROR

Error code: PR_CONNECT_RESET_ERROR

The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.
Please contact the website owners to inform them of this problem.

 

How do I get those files across?

0 Kudos
ShivaniK_Intel
Moderator
1,696 Views

Hi,

Sorry for the inconvenience.


We have internally shared the details again to upload the executable and input files. Please share the files through that link.


Thanks & Regards

Shivani


0 Kudos
ShivaniK_Intel
Moderator
1,606 Views

Hi,


As we did not hear back from you could you share the details through the link which is provided to you internally. The link might expire soon so request you to share the details as early as possible.


Thanks & Regards

Shivani


0 Kudos
ShivaniK_Intel
Moderator
1,504 Views

Hi,

We have not heard back from you. This thread will be no longer monitored by Intel. If you need further assistance, please post a new question.


Thanks & Regards

Shivani


0 Kudos
Reply