- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to get those source code ro binary on cpu and mic. show as in http://software.intel.com/en-us/intel-mkl
Benchmarks--> intel xeon phi Cooprocessor.
I want to benchmark them on mic.
Who can tell me where to get them?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know there where benchmark/source code packages available via Premier (even prior to the launch). However, the High Performance Linpack (HPL) can be downloaded from http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download. The Linux archive contains the binaries for Intel Xeon Phi.
Regarding the other benchmarks, you probably refer to http://software.intel.com/en-us/intel-mkl#pid-12768-1295 ("BENCHMARKS" tab, "Intel® Xeon Phi™ Coprocessor" tab) -- here, I think the source code of the benchmarks used to generate the graphs/charts is not available for download.
The fore mentioned benchmarks are meant to be straight-forward calls into Intel MKL. If you look at the footnote of each of the charts you will find information on how to reproduce these numbers, the system setup, etc. It should be fairly easy to reproduce these numbers. I think the latest MPSS made it even easier because some of the usual adjustments (huge pages, etc.) are aimed to happen automatically.
As you already know, a place to look for developer information is http://software.intel.com/mic-developer/. Once you got your benchmark code ready, please have a look at thread affinitization. In particular, KMP_PLACE_THREADS makes your life easier (Compiler 13 Update 2, see also here). Feel free to share your numbers, and also ask for help with Intel MKL if you need.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hans Pabst (Intel) wrote:
I know there where benchmark/source code packages available via Premier (even prior to the launch). However, the High Performance Linpack (HPL) can be downloaded from http://software.intel.com/en-us/articles/intel-math-kernel-library-linpa.... The Linux archive contains the binaries for Intel Xeon Phi.
Regarding the other benchmarks, you probably refer to http://software.intel.com/en-us/intel-mkl#pid-12768-1295 ("BENCHMARKS" tab, "Intel® Xeon Phi™ Coprocessor" tab) -- here, I think the source code of the benchmarks used to generate the graphs/charts is not available for download.
The fore mentioned benchmarks are meant to be straight-forward calls into Intel MKL. If you look at the footnote of each of the charts you will find information on how to reproduce these numbers, the system setup, etc. It should be fairly easy to reproduce these numbers. I think the latest MPSS made it even easier because some of the usual adjustments (huge pages, etc.) are aimed to happen automatically.
As you already know, a place to look for developer information is http://software.intel.com/mic-developer/. Once you got your benchmark code ready, please have a look at thread affinitization. In particular, KMP_PLACE_THREADS makes your life easier (Compiler 13 Update 2, see also here). Feel free to share your numbers, and also ask for help with Intel MKL if you need.
Thank you very much! I will try those ways, and them I will publish the result:)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As you probably know, you have multiple options to run code on Intel Xeon Phi. In case of Intel MKL, you have the auto-offload (AO) option in addtion to offloading the code, or to run an application on the coprocessor. With AO, you would use the host system and the coprocessor heterogenously. In case of GEMM (and probably soon in other cases as well), one can utilize multiple coprocessors. Anyhow, I guess you are interested on pure Xeon Phi performance probably even without data transfer?
For example with GEMM, you can offload the GEMM call, and measure the time within the offloaded region. Pseude code:
#pragma offload
void myfunc(..., time, ...)
{
start = tick();
DGEMM(...);
time = tick() - start;
}
As you can see you can offload an entire call chain of arbitrary code to the coprocessor. In the above case, the timing is done inside the offloaded region; hence you will omit the time of the data transfer. Of course you can compile the application also for "native MIC". Anyhow, the code inside an offload code region is not less native than cross-compiling with "-mmic"; hence "native" is sometimes better called Manycore-hosted. As a current update, notice that the fore mentioned Update 2 of the Intel Compiler made OpenMP 4.0 based pragmas/directives available as well. Of course, this is in addition to the Language Extensions for Offload ("LEO").
For the timing, I think the thread affinitization and the memory alignment are the main components for performance. Let me know if you have further questions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Hans.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page