Software Archive
Read-only legacy content

Less performance on mic

girish_b_
Beginner
475 Views

 HPL benchmark performance obtained on a host + 1 MIC cards is coming only 154GFlops. The Host system has 102 GB memory. The theoretical peak is 1.2TF +  + 256GFLOPS = 1.4TF.  May I please  know how to optimize the hpl performance? I've used the OFFLOAD execution, with the executable xhpl_offload_intel64.When i run hpl benchmark on simple host i am able to achieve 92 % performance. I am attaching all the files that i am using. Awaiting your quick reply.

0 Kudos
4 Replies
Sunny_G_Intel
Employee
475 Views

Hello Girish,

Your HPL benchmark optimized performance will depend on lot of parameters including the problem size (Ns). The problem size you have in your compressed folder has it set to 4000. In order to investigate the issue further can you please let me know what change do you see when you update that number to something like 16K or 64K.

Thanks

 

0 Kudos
girish_b_
Beginner
475 Views

Hello Sunny,

 I have increased the problem size to 45k then the hpl is running and performance  is 154Gf. More than 45k the hpl is terminating by throwing the following error. error in scifi_send 0 : success

 

0 Kudos
Sunny_G_Intel
Employee
475 Views

Hi Girish,

Sorry for the delayed reply. I was out of office on Monday.

Regarding the SCIF error you are getting can you please ensure that the host is able to reach the coprocessor. What do you see in your HPL output for "Number of Intel(R) Xeon Phi(TM) coprocessors : ".  If you see anything less than 1, then I suggest you restart the MPSS service on your host and verify if the host can reach the coprocessor. MPSS service can be restarted as follows"

sudo service mpss restart

I see that in the HPL_Offload.dat file you have, P and Q is set to 4,4. Would it possible to try different decompostion like 1,1 and 1,2 and correspondingly set number of MPI_PROC_NUM to PxQ? Currently you have PxQ = 16 which might not be the optimized setting for the configuration you have. Also, I see you have MPI_PER_NODE set to 2 which should correspond to the number of sockets on your host for better performance. 

Let me know if this works.

Thanks,

0 Kudos
girish_b_
Beginner
475 Views

Hi Sunny,

I am able to run the HPL as specified by you.

problem size 65536 ,block size 256 ,p*q is 1*2 but the performance is 519.2GF.

With P*Q values like 1,1 the performance is low and there are two sockets on the board.

Kindly suggest me for optimization and please let me know the optimized performance of MIC card that you have achieved.

0 Kudos
Reply