- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If we are using evaluation version of Intel MPI/compiler/MKL (cluster tools) how many cores/processes we can run HPL on?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hpl.log file when we run hpl on nodes larger than 2880
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Nihir,
Thanks for getting in touch. We're done HPL runs with Intel MPI and MKL up to 150K ranks. At that scale, there's a few things you need to set, some which relate to internal thresholds for the Intel MPI Library, and others are timeout and retry values for the DAPL stack. Here's a recent set of env settings used to scale out:
export I_MPI_FABRICS=shm:dapl
export I_MPI_DAPL_UD=on
export I_MPI_DAPL_UD_SEND_BUFFER_NUM=4096
export I_MPI_DAPL_UD_RECV_BUFFER_NUM=4096
export I_MPI_DAPL_UD_ACK_SEND_POOL_SIZE=4096
export I_MPI_DAPL_UD_ACK_RECV_POOL_SIZE=4096
export I_MPI_DAPL_UD_RNDV_EP_NUM=2
export I_MPI_DAPL_UD_REQ_EVD_SIZE=2000
export DAPL_UCM_REP_TIME=2000 # REQUEST timer, waiting for REPLY in millisecs
export DAPL_UCM_RTU_TIME=2000 # REPLY timer, waiting for RTU in millisecs
export DAPL_UCM_RETRY=10 # REQUEST and REPLY retries
export DAPL_ACK_RETRY=10 # IB RC Ack retry count
export DAPL_ACK_TIMER=20 # IB RC Ack retry timer
Give these a try and let me know how it goes. I would recommend you ensure you're running the latest dapl package from the OpenFabrics repository.
My colleague Vipin posted a nice overview of running HPL with the Intel tools. In there, you'll see a link to download Intel's optimized LINPACK benchmark. I hope this helps.
Regards,
~Gergana
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gergana,
Thanks for the input, I will try it out and provide you the feedback.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gergana,
Can you please help to delete the hpl.log files?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Done!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gergana,
In our network setup we have two Infiniband leaf switches that is causing this issue. Do you have any recommendation on how to run the HPL in a network that has two leaf switches?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page