<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MKL and the Parallel option in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794549#M2556</link>
    <description>&lt;DIV id="tiny_quote"&gt;
                &lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;(duplicated)&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Mon, 07 Mar 2011 18:53:19 GMT</pubDate>
    <dc:creator>cppcoder</dc:creator>
    <dc:date>2011-03-07T18:53:19Z</dc:date>
    <item>
      <title>MKL and the Parallel option</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794542#M2549</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm using Visual Studio 2008, Intel compiler v11.1 and the MKL library that comes with it. I started my project using the Sequential option for MKL but now I want to use the parallel option. However, when I switch to parallel and recompile (release version), I neither see any performance improvement, nor see that the executable uses more than the CPUs that the sequential version uses (one). I have &lt;B&gt;8&lt;/B&gt; &lt;B&gt;cores&lt;/B&gt;(EDITED) more than 5 Gb RAM, using Windows 7 x64, and generating an x64 executable (fp model used is precise)&lt;/P&gt;&lt;P&gt;In my case, I'm generating about ~800k random numbers with VSL functions, and then getting the log of those numbers using another VSL function. I think that such volume of computations should benefit from parallelism. What am I doing wrong?&lt;/P&gt;&lt;P&gt;The only thing I change is the MKL option from Sequential to Parallel.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;EDIT: Setting the variableMKL_NUM_THREADS=4 before executing my program from the command line, does not yield any change from what I stated above.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2011 18:44:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794542#M2549</guid>
      <dc:creator>cppcoder</dc:creator>
      <dc:date>2011-02-24T18:44:27Z</dc:date>
    </item>
    <item>
      <title>MKL and the Parallel option</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794543#M2550</link>
      <description>That's an unexpected behaviour. We need to check it on our side. Did you check the execution time in the case of sequentialvs threaded version?</description>
      <pubDate>Thu, 24 Feb 2011 19:15:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794543#M2550</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2011-02-24T19:15:50Z</dc:date>
    </item>
    <item>
      <title>MKL and the Parallel option</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794544#M2551</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Yes, it's approximately the same time (which I don't find surprising given than no more CPUs appear to be used)&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm sorry, but it's not the case. I was taking the time in other parts of my program together with the VSL functions. Now that I isolated the times that VSL routines take, I have notice the following (all times were measured with pairs ofGetTickCount() calls):&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;SPAN style="line-height: normal;"&gt;Execution of the sequential version takes much less time (15-32 ms in several runs) than the parallel version (~1000 - ~2000 ms in several runs)&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN style="line-height: normal;"&gt;CPU usage never goes beyond 20% even when I change the number of threads with MKL_NUM_THREADSto the maximum number of processors (8)&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;I guess MKL is using several cores after all, but the computations I do (random number generation and taking log of those) are not demanding enough to notice any noticeable difference by humans, or to benefit from parallelism&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you have a different take, please let me know.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2011 19:22:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794544#M2551</guid>
      <dc:creator>cppcoder</dc:creator>
      <dc:date>2011-02-24T19:22:46Z</dc:date>
    </item>
    <item>
      <title>MKL and the Parallel option</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794545#M2552</link>
      <description>cppcoder, can you please name exact routines you were using for RNG generation with method used, and which log function are you using?&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;it would be also beneficial if you can provide your linking line.&lt;/DIV&gt;</description>
      <pubDate>Fri, 25 Feb 2011 07:57:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794545#M2552</guid>
      <dc:creator>Ilya_B_Intel</dc:creator>
      <dc:date>2011-02-25T07:57:58Z</dc:date>
    </item>
    <item>
      <title>MKL and the Parallel option</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794546#M2553</link>
      <description>I use MKL and IPP to compute FFT on one computer( E8200, 2cores, 2G mem,win XP) and the other ( Xeon X5670 *2, 24 cores with HT, 64G mem, Win7 x64), but the results are of no significant changes. The CPU usage of the Xeon X5670 never goes beyond 10%, and I am also confused.</description>
      <pubDate>Fri, 25 Feb 2011 14:37:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794546#M2553</guid>
      <dc:creator>Seth_Sampson</dc:creator>
      <dc:date>2011-02-25T14:37:12Z</dc:date>
    </item>
    <item>
      <title>MKL and the Parallel option</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794547#M2554</link>
      <description>If your objective is to keep the hyperthreads busy on the Windows task manager, without caring about performance, did you read the discussions about MKL_DYNAMIC? You may be spending much of your time in MKL functions which can't use so many threads, so you would have to answer the questions about specifics before you could get expert comments.</description>
      <pubDate>Fri, 25 Feb 2011 17:04:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794547#M2554</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2011-02-25T17:04:22Z</dc:date>
    </item>
    <item>
      <title>MKL and the Parallel option</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794548#M2555</link>
      <description>&lt;DIV id="tiny_quote"&gt;
                &lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A href="https://community.intel.com/en-us/profile/404361/" class="basic" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=404361"&gt;Ilya Burylov (Intel)&lt;/A&gt;&lt;/DIV&gt;
                &lt;DIV style="background-color: #e5e5e5; padding: 5px; border: 1px; border-style: inset; margin-left: 2px; margin-right: 2px;"&gt;&lt;I&gt;cppcoder, can you please name exact routines you were using for RNG generation with method used, and which log function are you using?&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;it would be also beneficial if you can provide your linking line.&lt;/DIV&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Yes, I use these functions in the order specified below:&lt;/P&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV id="_mcePaste"&gt;vdRngGamma&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;vdLog10&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;vdRngUniform&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;vdLog10&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 07 Mar 2011 18:48:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794548#M2555</guid>
      <dc:creator>cppcoder</dc:creator>
      <dc:date>2011-03-07T18:48:29Z</dc:date>
    </item>
    <item>
      <title>MKL and the Parallel option</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794549#M2556</link>
      <description>&lt;DIV id="tiny_quote"&gt;
                &lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;(duplicated)&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 07 Mar 2011 18:53:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794549#M2556</guid>
      <dc:creator>cppcoder</dc:creator>
      <dc:date>2011-03-07T18:53:19Z</dc:date>
    </item>
    <item>
      <title>MKL and the Parallel option</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794550#M2557</link>
      <description>cppcoder,&lt;BR /&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;vdRngGamma and vdUniform functions are not threaded in MKL. Threaded functins vdLog10 takes around 10-15% of overall time in this call sequence and thus benefit from their parallelization is not visible.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;In general case threading of sequences of VML and VSL function calls is more efficient on higher level than function-by-function. Higher level helps to minimize threading overheads and cache issues.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;In order to utilize threading of VSL functions you might use one of techniqes:&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN style="line-height: normal;"&gt;Creating independent streams&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN style="line-height: normal;"&gt;Splitting streams into blocks withvslSkipAheadStream function&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN style="line-height: normal;"&gt;Splitting streams into severaldisjoint subsequences withvslLeapfrogStream function&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;DIV&gt;See&lt;A href="http://software.intel.com/sites/products/documentation/hpc/mkl/vsl/vslnotes.pdf"&gt;Intel Math Kernel Library Vector Statistical Library Note&lt;/A&gt;for details (chapter 7.3.5)&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 09 Mar 2011 08:36:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-and-the-Parallel-option/m-p/794550#M2557</guid>
      <dc:creator>Ilya_B_Intel</dc:creator>
      <dc:date>2011-03-09T08:36:08Z</dc:date>
    </item>
  </channel>
</rss>

