Floating Point Exception Overflow and OpenMPI equiv tunning?


I am working with star ccm+ 2019.1.1 Build 14.02.012
CentOS 7.6 kernel 3.10.0-957.21.3.el7.x86_64
Intel MPI Version 2018 Update 5 Build 20190404 (this is version shipped with star ccm+)
Cisco UCS cluster using USNIC fabric over 10gbe
Intel(R) Xeon(R) CPU E5-2698
7 nodes, 280 cores

enic RPM version kmod-enic- installed
usnic RPM kmod-usnic_verbs- installed
enic modinfo version:
enic loaded module version:
usnic_verbs modinfo version:
usnic_verbs loaded module version:
libdaplusnic RPM version 2.0.39cisco3.2.112.8 installed
libfabric RPM version 1.6.0cisco3.2.112.9.rhel7u6 installed

On runs less than 5 hours, everything works flawlessly and is quite fast.

However when running with 280 cores at or around 5 hours into a job, the longer jobs die with the floating point exception.
The same job completes fine with 140 cores, but takes about 14 hours to finish. 
Also I am using PBS Pro with 99 hour wall time

Turbulent viscosity limited on 56 cells in Region
A floating point exception has occurred: floating point exception [Overflow].  The specific cause cannot be identified.  Please refer to the troubleshooting section of the User's Guide.
Context: star.coupledflow.CoupledImplicitSolver
Command: Automation.Run
   error: Server Error

I have been doing some reading and some say that using other MPI are more stable with Star CCM.

I have not ruled out that I am missing some parameters or tuning with Intel MPI as this is a new cluster.

I am also trying to make Open MPI work.  I have openmpi compiled and it runs, however only with very small number of CPU.  Anything over about 2 cores per node it hangs indefinately.

I have compiled Open MPI 3.1.3 from because this is what Star CCM version I am running supports.  I am telling star to use the open mpi that I installed so it can support the Cisco USNIC fabric, which I can verify using Cisco native tools.  Note that star also ships with openmpi however 

I am thinking that I need to tune OpenMPI, which was also requried with Intel MPI.

With Intel MPI, jobs with more than about 100 cores would hang until I added these parameters:



After adding these parms I can scale to 280 cores and it runs very fast, up until the point where it gets the floating point exception.

I am struggling trying to find equivelant turning parms for Open MPI or resolve the floating point overflow.

I have listed all the MCA available with Open using MCA, and have tried setting these parms with no success.

btl_max_send_size = 4096
btl_usnic_eager_limit = 2147483647
btl_usnic_rndv_eager_limit = 2147483647
btl_usnic_sd_num = 8208
btl_usnic_rd_num = 8208
btl_usnic_prio_sd_num = 8704
btl_usnic_prio_rd_num = 8704
btl_usnic_pack_lazy_threshold = -1

Does anyone have any advice or ideas for:

1.) The floating point overflow issue
2.)  Know of equivelant tuning parms for Open MPI 

Many thanks in advance

