Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
33 Views

How many processes we can run run HPL on with Intel MPI ?

If we are using evaluation version of Intel MPI/compiler/MKL (cluster tools) how many cores/processes we can run HPL on?

0 Kudos
6 Replies
Highlighted
Beginner
33 Views

hpl.log file when we run hpl on nodes larger than 2880

0 Kudos
Highlighted
33 Views

Hey Nihir,

Thanks for getting in touch.  We're done HPL runs with Intel MPI and MKL up to 150K ranks.  At that scale, there's a few things you need to set, some which relate to internal thresholds for the Intel MPI Library, and others are timeout and retry values for the DAPL stack.  Here's a recent set of env settings used to scale out:

export I_MPI_FABRICS=shm:dapl
export I_MPI_DAPL_UD=on

export I_MPI_DAPL_UD_SEND_BUFFER_NUM=4096
export I_MPI_DAPL_UD_RECV_BUFFER_NUM=4096
export I_MPI_DAPL_UD_ACK_SEND_POOL_SIZE=4096
export I_MPI_DAPL_UD_ACK_RECV_POOL_SIZE=4096
export I_MPI_DAPL_UD_RNDV_EP_NUM=2
export I_MPI_DAPL_UD_REQ_EVD_SIZE=2000

export DAPL_UCM_REP_TIME=2000  #  REQUEST timer, waiting for REPLY in millisecs
export DAPL_UCM_RTU_TIME=2000  #  REPLY timer, waiting for RTU in millisecs
export DAPL_UCM_RETRY=10         #  REQUEST and REPLY retries
export DAPL_ACK_RETRY=10         #  IB RC Ack retry count
export DAPL_ACK_TIMER=20       #  IB RC Ack retry timer

Give these a try and let me know how it goes.  I would recommend you ensure you're running the latest dapl package from the OpenFabrics repository.

My colleague Vipin posted a nice overview of running HPL with the Intel tools.  In there, you'll see a link to download Intel's optimized LINPACK benchmark. I hope this helps.

Regards,
~Gergana

gss
0 Kudos
Highlighted
Beginner
33 Views

Gergana,

Thanks for the input, I will try it out and provide you the feedback.

0 Kudos
Highlighted
Beginner
33 Views

Gergana,

Can you please help to delete the hpl.log files?

0 Kudos
Highlighted
33 Views

Done!

gss
0 Kudos
Highlighted
Beginner
33 Views

Gergana,

In our network setup we have two Infiniband leaf switches that is causing this issue. Do you have any recommendation on how to run the HPL in a network that has two leaf switches?

0 Kudos