Hello Girish,

girish_b_ · ‎01-14-2016

HPL benchmark performance obtained on a host + 1 MIC cards is coming only 154GFlops. The Host system has 102 GB memory. The theoretical peak is 1.2TF + + 256GFLOPS = 1.4TF. May I please know how to optimize the hpl performance? I've used the OFFLOAD execution, with the executable xhpl_offload_intel64.When i run hpl benchmark on simple host i am able to achieve 92 % performance. I am attaching all the files that i am using. Awaiting your quick reply.

Sunny_G_Intel · ‎01-14-2016

Hello Girish,

Your HPL benchmark optimized performance will depend on lot of parameters including the problem size (Ns). The problem size you have in your compressed folder has it set to 4000. In order to investigate the issue further can you please let me know what change do you see when you update that number to something like 16K or 64K.

Thanks

girish_b_ · ‎01-14-2016

Hello Sunny,

I have increased the problem size to 45k then the hpl is running and performance is 154Gf. More than 45k the hpl is terminating by throwing the following error. error in scifi_send 0 : success

Sunny_G_Intel · ‎01-19-2016

Hi Girish,

Sorry for the delayed reply. I was out of office on Monday.

Regarding the SCIF error you are getting can you please ensure that the host is able to reach the coprocessor. What do you see in your HPL output for "Number of Intel(R) Xeon Phi(TM) coprocessors : ". If you see anything less than 1, then I suggest you restart the MPSS service on your host and verify if the host can reach the coprocessor. MPSS service can be restarted as follows"

sudo service mpss restart

I see that in the HPL_Offload.dat file you have, P and Q is set to 4,4. Would it possible to try different decompostion like 1,1 and 1,2 and correspondingly set number of MPI_PROC_NUM to PxQ? Currently you have PxQ = 16 which might not be the optimized setting for the configuration you have. Also, I see you have MPI_PER_NODE set to 2 which should correspond to the number of sockets on your host for better performance.

Let me know if this works.

Thanks,

girish_b_ · ‎01-20-2016

Hi Sunny,

I am able to run the HPL as specified by you.

problem size 65536 ,block size 256 ,p*q is 1*2 but the performance is 519.2GF.

With P*Q values like 1,1 the performance is low and there are two sockets on the board.

Kindly suggest me for optimization and please let me know the optimized performance of MIC card that you have achieved.

Less performance on mic