<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Using MKL with Xeon Phi MICS in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-MKL-with-Xeon-Phi-MICS/m-p/1083166#M22880</link>
    <description>&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I have to write a program to process a large amount of data. Since most of the processing involves matrix and vector operations I wanted to use MKL to take advantage of the optimized library. I created a toy example in C++ and OpenMP that runs relatively fast on my desktop computer with an Intel core i7, 8 threads (it takes about 10 minutes to do all the computations using MKL, specifically, function&amp;nbsp;cblas_dgemv()).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px;"&gt;I was given access to a Linux node with a 12 cores Intel Xeon processor and 2 Intel Xeon Phi coprocessors (MICs) with 61 cores each to run the program once it is ready.&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;When I moved this program to the Xeon node, it took about an hour to complete while I expected that the increased processing power would make quick work of the problem instead.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I think I read somewhere that MKL would transparently use the MICs when available. Is this true? Am I missing a compiler directive to make the compiler generate code or &lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;some other set up&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;to run MKL functions in the MICs? What could be making my code run significantly slower, even if it is not using the MICs?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Finally, if it is not as transparent for MKL to run in the MICs what do I have to do to make MKL computations use them?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;All help is appreciated. Thanks.&lt;/P&gt;</description>
    <pubDate>Tue, 20 Sep 2016 04:51:58 GMT</pubDate>
    <dc:creator>Ernesto_Z_</dc:creator>
    <dc:date>2016-09-20T04:51:58Z</dc:date>
    <item>
      <title>Using MKL with Xeon Phi MICS</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-MKL-with-Xeon-Phi-MICS/m-p/1083166#M22880</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I have to write a program to process a large amount of data. Since most of the processing involves matrix and vector operations I wanted to use MKL to take advantage of the optimized library. I created a toy example in C++ and OpenMP that runs relatively fast on my desktop computer with an Intel core i7, 8 threads (it takes about 10 minutes to do all the computations using MKL, specifically, function&amp;nbsp;cblas_dgemv()).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px;"&gt;I was given access to a Linux node with a 12 cores Intel Xeon processor and 2 Intel Xeon Phi coprocessors (MICs) with 61 cores each to run the program once it is ready.&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;When I moved this program to the Xeon node, it took about an hour to complete while I expected that the increased processing power would make quick work of the problem instead.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I think I read somewhere that MKL would transparently use the MICs when available. Is this true? Am I missing a compiler directive to make the compiler generate code or &lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;some other set up&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;to run MKL functions in the MICs? What could be making my code run significantly slower, even if it is not using the MICs?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Finally, if it is not as transparent for MKL to run in the MICs what do I have to do to make MKL computations use them?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;All help is appreciated. Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Sep 2016 04:51:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-MKL-with-Xeon-Phi-MICS/m-p/1083166#M22880</guid>
      <dc:creator>Ernesto_Z_</dc:creator>
      <dc:date>2016-09-20T04:51:58Z</dc:date>
    </item>
    <item>
      <title>Hi Ernesto, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-MKL-with-Xeon-Phi-MICS/m-p/1083167#M22881</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;A href="https://software.intel.com/en-us/user/1483984" style="font-size: 11px; background-color: rgb(238, 238, 238);"&gt;Ernesto&lt;/A&gt;,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Let's break down your question one by one.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;1. &amp;nbsp;&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;&amp;nbsp;Intel core i7, 8 threads 10 minutes vs. &amp;nbsp;12 cores Intel Xeon processor 1 hours&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;You mentioned: &amp;nbsp;"toy example in C++ and OpenMP that runs relatively fast on my desktop computer with an Intel core i7, 8 threads (it takes about 10 minutes to do all the computations using MKL, specifically, function&amp;nbsp;cblas_dgemv()).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;Could you please tell more details about your example, like MKL version, compiler. &amp;nbsp;problem size, OS etc.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;Or&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px;"&gt;Here is one MKL tutorial and code sample in&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;&lt;A href="https://software.intel.com/sites/default/files/mkl_c_samples_09072016.tgz" target="_blank"&gt;https://software.intel.com/sites/default/files/mkl_c_samples_09072016.tgz&lt;/A&gt;. &amp;nbsp;You may run one of them and let us know the result? &amp;nbsp; (please notes, use large workload to make sure utilize the power of Xeon processor ).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;2. &amp;nbsp;MKL on Core &amp;nbsp;or on MIC.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;MKL can support 3 model to work on MIC&lt;/P&gt;

&lt;P&gt;1. native &amp;nbsp;(all exe run on MIC)&lt;/P&gt;

&lt;P&gt;2. Automatic offload (AO) &amp;nbsp;( exe in Xeon &amp;nbsp;and part work on Xeon, parts of work on MIC)&amp;nbsp;In this model, MKL automatically detects the presence of Xeon Phi coprocessors based on Intel MIC architecture and automatically offload computation that may benefit from Xeon Phi coprocessors.&amp;nbsp;&amp;nbsp; The only change needed to enable AO is either setting an environment variable or a single function call.&lt;/P&gt;

&lt;P&gt;3.&amp;nbsp;Compiler Assisted Offload (CAO): This usage model help you to use Intel Compiler and it’s offload pragmas to offload computations to the coprocessors.&amp;nbsp; &amp;nbsp;Within the offload section, you have to specify the input and output data for the Intel MKL functions to be offloaded. &amp;nbsp;The compiler provided run-time libraries will transfer the functions with their data to the coprocessor to do the computations.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: black; font-size: 1em;"&gt;Please click on the below link to find many related articles, videos on how to use MKL on Intel Xeon Phi.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;B&gt;&lt;SPAN style="color: rgb(8, 109, 182);"&gt;&lt;A href="http://software.intel.com/en-us/articles/intel-mkl-on-the-intel-xeon-phi-coprocessors"&gt;Intel® Math Kernel Library (Intel® MKL) on Intel® Many Integrated Core Architecture (Intel® MIC Architecture)&lt;/A&gt; .&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;

&lt;P&gt;From your discription, &amp;nbsp;"&lt;SPAN style="font-size: 12px;"&gt;Intel Xeon processor and 2 Intel Xeon Phi coprocessors (MICs) with 61 cores", &amp;nbsp;and question about &amp;nbsp;transparently use to MIC.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;I suppose you expect to see the AO &amp;nbsp;model. But as the article mentioned, considering the functionality of Coprocessors, ( for highly computing-intensive workload) , the function like BLAS level 2 dgemv is not in that list.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://software.intel.com/en-us/articles/intel-mkl-automatic-offload-enabled-functions-for-intel-xeon-phi-coprocessors" target="_blank"&gt;http://software.intel.com/en-us/articles/intel-mkl-automatic-offload-enabled-functions-for-intel-xeon-phi-coprocessors&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The example you build may only run on Xeon and not on MIC. and dgemv is blas level 2 function with O(n^2), I may recommend you try dgemm at least for performance test.&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;BR /&gt;
	Ying&amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Sep 2016 02:35:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-MKL-with-Xeon-Phi-MICS/m-p/1083167#M22881</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2016-09-23T02:35:00Z</dc:date>
    </item>
  </channel>
</rss>

