Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

WRF and DAPL with Intel Cluster Studio?

Cory_Davis
Beginner
1,048 Views
Hi There,
I am using the Intel Cluster Studio for Linux (v2011.0.013) to build and run WRF (v3.1.1) on an Infinband Cluster. Theswitch is a: HP BLc Qlogic 4X QDR IB switch (PN: 505958-B21), theHCAs are: HP BLc Qlogic 4X QDR IB Mezz HCA (PN: 583210-B21). We are using the QLogic Infiniband software, which includes OFED.
I have built WRF using Intel compiler options and DM_PARALLEL.I have successfully run this WRF using shm:tcp and shm:tmi, but for the dozens of ways I have tried shm:dapl it has failed with a message like this ...
INPUT LandUse = "USGS"
WRF NUMBER OF TILES = 1
WRF NUMBER OF TILES = 1
[14] rtc_invalidate error 1114112
Assertion failed in file ../../i_rtc_hook.c at line 190: 0
internal ABORT - process 14
Digging around the internet suggests that this might be a problem with MKL producing threads. So I have tried using the -mt_mpi compiler switch for thread-safety. I have also tried using "-genv OMP_NUM_THREADS 1 -genvI_MPI_PIN_DOMAIN omp" with mpiexec. I have also tried compiling WRF without linking to the MKL libraries. Everything produces the same result. Maybe the DM_PARALLEL WRF is multi-threading?
The relevant line in /etc/dat.conf is
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" "".
OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" "" also gives the same result.
So, my questions are:
1. Is it possible to run WRF with DAPL on such a system, and what do I need to do to make it work?
2. If I could make it work, would I expect much better performance than shm:tmi?
Thanks for reading!
Cheers,
Cory.
0 Kudos
1 Solution
Dmitry_K_Intel2
Employee
1,048 Views
Hi Cory,

For Qlogic HCAs shm:tmi is the best variant.
Dapl implementation for Qlogic cards is not good enough yet - it's unstable and you'll get worse performance.

So, just use shm:tmi for now with Intel MPI library.

If you are going to use MKL library you need to add '-mt_mpi' compiler option and don't set OMP_NUM_THREADS environment variable. MKL and MPI libraries understand each other and MKL will not create more threads than number of cores available on a node. (You can try different -ppn btw)

Regards!
Dmitry

View solution in original post

0 Kudos
3 Replies
TimP
Honored Contributor III
1,048 Views
In the absence of an expert reply, I'll mention that DAPL is used with certain applications in an effort to scale to hundreds of ranks. However, with WRF, I've heard the hybrid MPI FUNNELED mode is used to get a modest increase in scaling (beyond 256 cores?). I don't know from my own experience, as WRF runs best on my current hardware at 32 ranks.
0 Kudos
Dmitry_K_Intel2
Employee
1,049 Views
Hi Cory,

For Qlogic HCAs shm:tmi is the best variant.
Dapl implementation for Qlogic cards is not good enough yet - it's unstable and you'll get worse performance.

So, just use shm:tmi for now with Intel MPI library.

If you are going to use MKL library you need to add '-mt_mpi' compiler option and don't set OMP_NUM_THREADS environment variable. MKL and MPI libraries understand each other and MKL will not create more threads than number of cores available on a node. (You can try different -ppn btw)

Regards!
Dmitry
0 Kudos
Cory_Davis
Beginner
1,048 Views
Thanks very much for the help!
0 Kudos
Reply