Thanks for getting in touch. We're done HPL runs with Intel MPI and MKL up to 150K ranks. At that scale, there's a few things you need to set, some which relate to internal thresholds for the Intel MPI Library, and others are timeout and retry values for the DAPL stack. Here's a recent set of env settings used to scale out:
export DAPL_UCM_REP_TIME=2000 # REQUEST timer, waiting for REPLY in millisecs
export DAPL_UCM_RTU_TIME=2000 # REPLY timer, waiting for RTU in millisecs
export DAPL_UCM_RETRY=10 # REQUEST and REPLY retries
export DAPL_ACK_RETRY=10 # IB RC Ack retry count
export DAPL_ACK_TIMER=20 # IB RC Ack retry timer
Give these a try and let me know how it goes. I would recommend you ensure you're running the latest dapl package from the OpenFabrics repository.
My colleague Vipin posted a nice overview of running HPL with the Intel tools. In there, you'll see a link to download Intel's optimized LINPACK benchmark. I hope this helps.
In our network setup we have two Infiniband leaf switches that is causing this issue. Do you have any recommendation on how to run the HPL in a network that has two leaf switches?