- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am running hpl on mic node , but facing the error and run is stopped .
Error in scif_send 0: Success.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Girish,
Would you like to provide details on how to reproduce the problem you saw. For example, what OS are you running, what MPSS version and compiler version are you using, where did you download the source code, how did you compile the code, and how did you run the executable?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Loc,
I am using xhpl_offload binary from the intel cluster suite 2015. The os is centos 6.5 ,MPSS version is 3.4.3 ,i have intel cluster suite 2015 and using the pre compiled offload binary . I am running it using the script that is present in mkl-benchmarks directory.
Now i am able to run the hpl but the performance is very low it is 540GF ( Theoritical value 1.2 TF )for the following specifications.
problem size =64000, block size= 256, P*Q =1*2, and MPI_PER_NODE=2 since the sockets on host is 2 and memory on host is 102 GB.
Kindly help me to obtain optimized performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Girish,
Sorry for the delay. I contacted an MKL expert and asked your question. The answer is the good performance of HPL can only be achieved using the latest version of the benchmark. MLK benchmarks are available in a package available at https://software.intel.com/en-us/articles/intel-mkl-benchmarks-suite
HPL is among the benchmarks contained in the package (please navigate to mp_linpack). Users need to read carefully the README and TUNNING files in order to get top performance measurement.
Please note that in the latest version of the package, runme_offload_intel64 no longer exists. It’s been absorbed into runme_intel64 and runme_intel64_dynamic. The usage model is “host only”, “native”, or “hybrid offload”. Therefore, both host and the coprocessor(s) will be used for “hybrid offload”. Sometimes, people see lower performance of “hybrid offload” than “native” or “host only”. This typically is the result of benchmark configuration problems, such as improper problem sizes, work distribution problems, etc.
Hope this help.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page