<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic No multithreading on small matrices? in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798990#M2913</link>
    <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1340407889062="58" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=312233" href="https://community.intel.com/en-us/profile/312233/" class="basic"&gt;yuriisig&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;&lt;SPAN style="font-size: small;"&gt;&lt;P&gt;At me the positive effect of multisequencing of algorithms of fast matrix multiplication is shown on matrixes not less than 1500 * 1500: &lt;A href="http://software.intel.com/ru-ru/forums/showthread.php?t=75835&amp;amp;o=a&amp;amp;s=lr"&gt;http://software.intel.com/ru-ru/forums/showthread.php?t=75835&amp;amp;o=a&amp;amp;s=lr&lt;/A&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;BR /&gt;Absolutely agree because modern CPUs arevery fast andit looks like useless to do anything else in case ofmultiplication of&lt;BR /&gt;small matrices. Thank you for the link and I'll take a look.&lt;BR /&gt;&lt;BR /&gt;A&lt;STRONG&gt;Strassen HBC&lt;/STRONG&gt; algorithm which I used for comparisonis a one thread algorithm designed and tuned upfor Embedded Real-Timesystems.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Sergey&lt;/P&gt;</description>
    <pubDate>Fri, 22 Jun 2012 23:37:43 GMT</pubDate>
    <dc:creator>SergeyKostrov</dc:creator>
    <dc:date>2012-06-22T23:37:43Z</dc:date>
    <item>
      <title>no multithreading on small matrices?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798985#M2908</link>
      <description>I observed that multithreading kicks in only for large matrices. The function below, compiled as below, is once fed (in a for loop of 10000) with small matrices and once with large matrices, and I saw once, only one of my core is used, and the other time both cores work.&lt;BR /&gt;when I have 100x10 and 10x10 matrices, no multithreading is engaged. with 200x10 and 10x10, multithreading is engaged.&lt;BR /&gt;&lt;BR /&gt;Are there any rules of thumb, also for other procedured than gemm? dcopy, dsctr, dsyrk, dpotri, dsymm, dgthr, daxpy&lt;BR /&gt;&lt;BR /&gt;Aside, I wondered what is the difference between "cblas_dcopy()" and "dcopy()".&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;T&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;#include &lt;STDIO.H&gt;&lt;BR /&gt;#include &lt;STDLIB.H&gt;&lt;BR /&gt;#include &lt;STDBOOL.H&gt;&lt;BR /&gt;#include &lt;MATH.H&gt;&lt;BR /&gt;#include &lt;/MATH.H&gt; //geom p q n&lt;BR /&gt;&lt;BR /&gt;void mttest(double *a, double *b, int *geom, double *c) {&lt;BR /&gt; double one = 1.0; double zero = 0;&lt;BR /&gt; dgemm("n","n",&amp;amp;geom[0],&amp;amp;geom[3],&amp;amp;geom[1],&amp;amp;one,a,&amp;amp;geom[0],b,&amp;amp;geom[2],&amp;amp;zero,c,&amp;amp;geom[0]);&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;gcc -std=gnu99 -fpic -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -c mttest.c -o mttest.o&lt;BR /&gt;gcc -std=gnu99 -shared -L/opt/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm mttest.o -o mttest.so&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/STDBOOL.H&gt;&lt;/STDLIB.H&gt;&lt;/STDIO.H&gt;</description>
      <pubDate>Tue, 19 Jun 2012 09:58:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798985#M2908</guid>
      <dc:creator>tletni</dc:creator>
      <dc:date>2012-06-19T09:58:31Z</dc:date>
    </item>
    <item>
      <title>no multithreading on small matrices?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798986#M2909</link>
      <description>MKL threaded functions contains the equivalent of omp if() clause to avoid performance degradation by threading on cases which are too small. &lt;BR /&gt;cblas_ wrappers accept value operands where appropriate and conform them with the Fortran default. They are open source code; look for yourself. Most C compilers know how to compile data moves in open C code or &lt;STRING.H&gt;, so dcopy() would rarely be used.&lt;/STRING.H&gt;</description>
      <pubDate>Tue, 19 Jun 2012 11:41:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798986#M2909</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-06-19T11:41:29Z</dc:date>
    </item>
    <item>
      <title>No multithreading on small matrices?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798987#M2910</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1340194244343="58" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=566696" href="https://community.intel.com/en-us/profile/566696/" class="basic"&gt;tletni&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;...when I have 100x10 and 10x10 matrices, no multithreading is engaged. with 200x10 and 10x10, multithreading is engaged...&lt;/I&gt;&lt;/DIV&gt;&lt;BR /&gt;Amultithreadingwould create a negative impact on overall performance if matrix sizes are too small ( less&lt;BR /&gt;then 128x128 )because of someoverhead related tocreation of threads. For example, if two matriceshave to&lt;BR /&gt;bemultiplied usingStrassen andClassic algorithms real performance improvements willhappen if sizes greater&lt;BR /&gt;than 128x128. Strassen algorithmdoes calculations faster even when onethread is used. I could provide some&lt;BR /&gt;real data if needed.&lt;/DIV&gt;</description>
      <pubDate>Wed, 20 Jun 2012 12:34:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798987#M2910</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-06-20T12:34:53Z</dc:date>
    </item>
    <item>
      <title>No multithreading on small matrices?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798988#M2911</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1340325050859="59" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=353541" href="https://community.intel.com/en-us/profile/353541/" class="basic"&gt;Sergey Kostrov&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1340325050859="60" jquery1340194244343="58" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=566696" href="https://community.intel.com/en-us/profile/566696/" class="basic"&gt;tletni&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;...when I have 100x10 and 10x10 matrices, no multithreading is engaged. with 200x10 and 10x10, multithreading is engaged...&lt;/I&gt;&lt;/DIV&gt;&lt;BR /&gt;...Strassen algorithmdoes calculations faster even when onethread is used. I could provide some&lt;BR /&gt;real data if needed.&lt;/DIV&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Here are performance results ( Operation - Matrix multiplication ).&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Size of both matrices: 128x128&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt; Matrix Size : 128 x 128&lt;BR /&gt; Matrix Size Threshold: N/A&lt;BR /&gt; Matrix Partitions : N/A&lt;BR /&gt; ResultSets Reflection: N/A&lt;BR /&gt; Calculating...&lt;BR /&gt; Classic A - Pass 1 - Completed: 0.03100 secs&lt;BR /&gt; Classic A - Pass 2 - Completed: 0.03100 secs&lt;BR /&gt; Classic A - Pass 3 - Completed: 0.01600 secs&lt;BR /&gt; Classic A - Pass 4 - Completed: 0.03100 secs&lt;BR /&gt; Classic A - Pass 5 - Completed: 0.01600 secs&lt;/P&gt;&lt;P&gt; Strassen HBI&lt;BR /&gt; Matrix Size : 128 x 128&lt;BR /&gt; Matrix Size Threshold: 64 x 64&lt;BR /&gt; Matrix Partitions : 1&lt;BR /&gt; ResultSets Reflection: N/A&lt;BR /&gt; Calculating...&lt;BR /&gt; Strassen HBI - Pass 1 - Completed: 0.01500 secs&lt;BR /&gt; Strassen HBI - Pass 2 - Completed: 0.03100 secs&lt;BR /&gt; Strassen HBI - Pass 3 - Completed: 0.01600 secs&lt;BR /&gt; Strassen HBI - Pass 4 - Completed: 0.01600 secs&lt;BR /&gt; Strassen HBI - Pass 5 - Completed: 0.03100 secs&lt;/P&gt;&lt;P&gt; Strassen HBC&lt;BR /&gt; Matrix Size : 128 x 128&lt;BR /&gt; Matrix Size Threshold: 8 x 8&lt;BR /&gt; Matrix Partitions : 2801&lt;BR /&gt; ResultSets Reflection: Enabled&lt;BR /&gt; Calculating...&lt;BR /&gt; Strassen HBC - Pass 1 - Completed: 0.12500 secs&lt;BR /&gt; Strassen HBC - Pass 2 - Completed: 0.03100 secs&lt;BR /&gt; Strassen HBC - Pass 3 - Completed: 0.03100 secs&lt;BR /&gt; Strassen HBC - Pass 4 - Completed: 0.03200 secs&lt;BR /&gt; Strassen HBC - Pass 5 - Completed: 0.01500 secs&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Size of both matrices: 256x256&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt; Matrix Size : 256 x 256&lt;BR /&gt; Matrix Size Threshold: N/A&lt;BR /&gt; Matrix Partitions : N/A&lt;BR /&gt; ResultSets Reflection: N/A&lt;BR /&gt; Calculating...&lt;BR /&gt; Classic A - Pass 1 - Completed: 0.59400 secs&lt;BR /&gt; Classic A - Pass 2 - Completed: 0.60900 secs&lt;BR /&gt; Classic A - Pass 3 - Completed: 0.59400 secs&lt;BR /&gt; Classic A - Pass 4 - Completed: 0.59400 secs&lt;BR /&gt; Classic A - Pass 5 - Completed: 0.60900 secs&lt;/P&gt;&lt;P&gt; Strassen HBI&lt;BR /&gt; Matrix Size : 256 x 256&lt;BR /&gt; Matrix Size Threshold: 128 x 128&lt;BR /&gt; Matrix Partitions : 1&lt;BR /&gt; ResultSets Reflection: N/A&lt;BR /&gt; Calculating...&lt;BR /&gt; Strassen HBI - Pass 1 - Completed: 0.17200 secs&lt;BR /&gt; Strassen HBI - Pass 2 - Completed: 0.17200 secs&lt;BR /&gt; Strassen HBI - Pass 3 - Completed: 0.15600 secs&lt;BR /&gt; Strassen HBI - Pass 4 - Completed: 0.17200 secs&lt;BR /&gt; Strassen HBI - Pass 5 - Completed: 0.17200 secs&lt;/P&gt;&lt;P&gt; Strassen HBC&lt;BR /&gt; Matrix Size : 256 x 256&lt;BR /&gt; Matrix Size Threshold: 16 x 16&lt;BR /&gt; Matrix Partitions : 2801&lt;BR /&gt; ResultSets Reflection: Enabled&lt;BR /&gt; Calculating...&lt;BR /&gt; Strassen HBC - Pass 1 - Completed: 0.37500 secs&lt;BR /&gt; Strassen HBC - Pass 2 - Completed: 0.17200 secs&lt;BR /&gt; Strassen HBC - Pass 3 - Completed: 0.17200 secs&lt;BR /&gt; Strassen HBC - Pass 4 - Completed: 0.17200 secs&lt;BR /&gt; Strassen HBC - Pass 5 - Completed: 0.17200 secs&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Size of both matrices: 512x512&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt; Matrix Size : 512 x 512&lt;BR /&gt; Matrix Size Threshold: N/A&lt;BR /&gt; Matrix Partitions : N/A&lt;BR /&gt; ResultSets Reflection: N/A&lt;BR /&gt; Calculating...&lt;BR /&gt; Classic A - Pass 1 - Completed: 10.81200 secs&lt;BR /&gt; Classic A - Pass 2 - Completed: 10.84400 secs&lt;BR /&gt; Classic A - Pass 3 - Completed: 10.82800 secs&lt;BR /&gt; Classic A - Pass 4 - Completed: 10.82800 secs&lt;BR /&gt; Classic A - Pass 5 - Completed: 10.82800 secs&lt;/P&gt;&lt;P&gt; Strassen HBI&lt;BR /&gt; Matrix Size : 512 x 512&lt;BR /&gt; Matrix Size Threshold: 256 x 256&lt;BR /&gt; Matrix Partitions : 1&lt;BR /&gt; ResultSets Reflection: N/A&lt;BR /&gt; Calculating...&lt;BR /&gt; Strassen HBI - Pass 1 - Completed: 1.39100 secs&lt;BR /&gt; Strassen HBI - Pass 2 - Completed: 1.37500 secs&lt;BR /&gt; Strassen HBI - Pass 3 - Completed: 1.35900 secs&lt;BR /&gt; Strassen HBI - Pass 4 - Completed: 1.37500 secs&lt;BR /&gt; Strassen HBI - Pass 5 - Completed: 1.37500 secs&lt;/P&gt;&lt;P&gt; Strassen HBC&lt;BR /&gt; Matrix Size : 512 x 512&lt;BR /&gt; Matrix Size Threshold: 32 x 32&lt;BR /&gt; Matrix Partitions : 2801&lt;BR /&gt; ResultSets Reflection: Enabled&lt;BR /&gt; Calculating...&lt;BR /&gt; Strassen HBC - Pass 1 - Completed: 1.12500 secs&lt;BR /&gt; Strassen HBC - Pass 2 - Completed: 0.65600 secs&lt;BR /&gt; Strassen HBC - Pass 3 - Completed: 0.64100 secs&lt;BR /&gt; Strassen HBC - Pass 4 - Completed: 0.65600 secs&lt;BR /&gt; Strassen HBC - Pass 5 - Completed: 0.65600 secs&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Notes:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt; Strassen HBI - Strassen's Heap Based Incomplete algorithm for matrix multiplication&lt;BR /&gt; Strassen HBC - Strassen's Heap Based Complete algorithm for matrix multiplication&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jun 2012 00:30:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798988#M2911</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-06-22T00:30:47Z</dc:date>
    </item>
    <item>
      <title>No multithreading on small matrices?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798989#M2912</link>
      <description>&lt;SPAN style="font-size: small;"&gt;&lt;P&gt;At me the positive effect of multisequencing of algorithms of fast matrix multiplication is shown on matrixes not less than 1500 * 1500: &lt;A href="http://software.intel.com/ru-ru/forums/showthread.php?t=75835&amp;amp;o=a&amp;amp;s=lr"&gt;http://software.intel.com/ru-ru/forums/showthread.php?t=75835&amp;amp;o=a&amp;amp;s=lr&lt;/A&gt;&lt;/P&gt;&lt;/SPAN&gt;</description>
      <pubDate>Fri, 22 Jun 2012 14:57:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798989#M2912</guid>
      <dc:creator>yuriisig</dc:creator>
      <dc:date>2012-06-22T14:57:54Z</dc:date>
    </item>
    <item>
      <title>No multithreading on small matrices?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798990#M2913</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1340407889062="58" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=312233" href="https://community.intel.com/en-us/profile/312233/" class="basic"&gt;yuriisig&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;&lt;SPAN style="font-size: small;"&gt;&lt;P&gt;At me the positive effect of multisequencing of algorithms of fast matrix multiplication is shown on matrixes not less than 1500 * 1500: &lt;A href="http://software.intel.com/ru-ru/forums/showthread.php?t=75835&amp;amp;o=a&amp;amp;s=lr"&gt;http://software.intel.com/ru-ru/forums/showthread.php?t=75835&amp;amp;o=a&amp;amp;s=lr&lt;/A&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;BR /&gt;Absolutely agree because modern CPUs arevery fast andit looks like useless to do anything else in case ofmultiplication of&lt;BR /&gt;small matrices. Thank you for the link and I'll take a look.&lt;BR /&gt;&lt;BR /&gt;A&lt;STRONG&gt;Strassen HBC&lt;/STRONG&gt; algorithm which I used for comparisonis a one thread algorithm designed and tuned upfor Embedded Real-Timesystems.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Sergey&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jun 2012 23:37:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/no-multithreading-on-small-matrices/m-p/798990#M2913</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-06-22T23:37:43Z</dc:date>
    </item>
  </channel>
</rss>

