<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic IPP Matrix functions in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792600#M2528</link>
    <description>&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;I think Viktor want to check if all CPU features are enabled in the Virtual Box, it needs make sure the optimized CPU code were actually used. You can check with ippsGetLibVersion() function, please see "Example 3-1 Using the Function ippsGetLibVersion" in the ippsman.pdf manual on this fucntions. If it used some PX code, actually the code did not use optimized function. &lt;/P&gt;&lt;P&gt;3x3 matrix computation is very fast operation, compared with the function call overhead. If possibly, you can combine the matrix multiplication into one function call with "matrix array operation". &lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Chao&lt;/P&gt;</description>
    <pubDate>Fri, 25 Jun 2010 02:34:19 GMT</pubDate>
    <dc:creator>Chao_Y_Intel</dc:creator>
    <dc:date>2010-06-25T02:34:19Z</dc:date>
    <item>
      <title>IPP Matrix functions</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792595#M2523</link>
      <description>Hello,&lt;BR /&gt;&lt;BR /&gt;I'd like to optimize the execution speed of my code using small matrices (3x3) and I tried to use IPP functions but the result seems to be slower than the original code. My program is running on Ubuntu 32-but using a VM (VirtualBox) that can only handle 1 processor on this computer. Here is the function I've tried to optimize :&lt;BR /&gt;&lt;BR /&gt;&lt;PRE&gt;[cpp]/* Compute an updated rotation matrix given the initial rotation &lt;BR /&gt; * and the correction (w) */&lt;BR /&gt;void rot_update(double *R, double *w, double *Rnew) &lt;BR /&gt;{&lt;BR /&gt;    double theta, sinth, costh, n[3];&lt;BR /&gt;    double nx[9], nxsq[9], Rnew2[9];&lt;BR /&gt;    double term2[9], term3[9];&lt;BR /&gt;    double tmp[9], dR[9];&lt;BR /&gt;&lt;BR /&gt;    double ident[9] = &lt;BR /&gt;	{ 1.0, 0.0, 0.0,&lt;BR /&gt;	  0.0, 1.0, 0.0,&lt;BR /&gt;	  0.0, 0.0, 1.0 };&lt;BR /&gt;&lt;BR /&gt;    theta = sqrt(w[0] * w[0] + w[1] * w[1] + w[2] * w[2]);&lt;BR /&gt;&lt;BR /&gt;    if (theta == 0.0) {&lt;BR /&gt;	memcpy(Rnew, R, sizeof(double) * 9);&lt;BR /&gt;	return;&lt;BR /&gt;    }&lt;BR /&gt;&lt;BR /&gt;    n[0] = w[0] / theta;&lt;BR /&gt;    n[1] = w[1] / theta;&lt;BR /&gt;    n[2] = w[2] / theta;&lt;BR /&gt;&lt;BR /&gt;    nx[0] = 0.0;   nx[1] = -n[2];  nx[2] = n[1];&lt;BR /&gt;    nx[3] = n[2];  nx[4] = 0.0;    nx[5] = -n[0];&lt;BR /&gt;    nx[6] = -n[1]; nx[7] = n[0];   nx[8] = 0.0;&lt;BR /&gt;&lt;BR /&gt;    matrix_product33(nx, nx, nxsq);&lt;BR /&gt;&lt;BR /&gt;    sinth = sin(theta);&lt;BR /&gt;    costh = cos(theta);&lt;BR /&gt;&lt;BR /&gt;    matrix_scale(3, 3, nx, sinth, term2);&lt;BR /&gt;    matrix_scale(3, 3, nxsq, 1.0 - costh, term3);&lt;BR /&gt;&lt;BR /&gt;    matrix_sum(3, 3, 3, 3, ident, term2, tmp);&lt;BR /&gt;    matrix_sum(3, 3, 3, 3, tmp, term3, dR);&lt;BR /&gt;&lt;BR /&gt;    matrix_product33(dR, R, Rnew2);&lt;BR /&gt;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;[/cpp]&lt;/PRE&gt; ______________________________________________________&lt;BR /&gt;&lt;BR /&gt;&lt;PRE&gt;[cpp]/* Compute an updated rotation matrix given the initial rotation 
 * and the correction (w) */
void rot_update(double *R, double *w, double *Rnew) 
{
    int i, stride0, stride1;
    Ipp32f mtheta, msinth, mcosth, mn[3];
    Ipp32f mat[9], matsq[9], matterm2[9], matterm3[9], mattmp[9], matdR[9], matR[9], matRNew[9];

    Ipp32f matident[9] =
    { 1.0, 0.0, 0.0,
      0.0, 1.0, 0.0,
      0.0, 0.0, 1.0 };

    mtheta = sqrt(w[0] * w[0] + w[1] * w[1] + w[2] * w[2]);

    if (mtheta == 0.0) {
        memcpy(Rnew, R, sizeof(double) * 9);
        return;
    }

    mn[0] = w[0] / mtheta;
    mn[1] = w[1] / mtheta;
    mn[2] = w[2] / mtheta;

    mat[0] = 0.0;   mat[1] = -mn[2];  mat[2] = mn[1];
    mat[3] = mn[2];  mat[4] = 0.0;    mat[5] = -mn[0];
    mat[6] = -mn[1]; mat[7] = mn[0];   mat[8] = 0.0;
    stride0 = 3*sizeof(Ipp32f);
    stride1 = sizeof(Ipp32f);

    ippmMul_mm_32f(mat, stride0, stride1, 3, 3, mat, stride0, stride1, 3, 3, matsq, stride0, stride1);

    msinth = sin(mtheta);
    mcosth = cos(mtheta);

    ippmMul_mc_32f(mat, stride0, stride1,msinth, matterm2, stride0, stride1, 3, 3);

    ippmMul_mc_32f(matsq, stride0, stride1, 1.0-mcosth, matterm3, stride0, stride1, 3, 3);

    ippmAdd_mm_32f(matident, stride0, stride1, matterm2, stride0, stride1, mattmp, stride0, stride1, 3, 3);

    ippmAdd_mm_32f(mattmp, stride0, stride1, matterm3, stride0, stride1, matdR, stride0, stride1, 3, 3);

    for(i=0; i&amp;lt;9; ++i)
         matR&lt;I&gt; = R&lt;I&gt;;

    ippmMul_mm_32f(matdR, stride0, stride1, 3, 3, matR, stride0, stride1, 3, 3, matRNew, stride0, stride1);

    for(i=0; i&amp;lt;9; ++i){
         Rnew&lt;I&gt; = matRNew&lt;I&gt;;
     }
}&lt;BR /&gt;[/cpp]&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;&lt;BR /&gt;I didn't test it yet on other computers/systems with more processors/cores but I wanted to know if I was misusing the library or if something else in my code was wrong or could be enhanced.&lt;BR /&gt;</description>
      <pubDate>Wed, 23 Jun 2010 08:41:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792595#M2523</guid>
      <dc:creator>shivany</dc:creator>
      <dc:date>2010-06-23T08:41:28Z</dc:date>
    </item>
    <item>
      <title>IPP Matrix functions</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792596#M2524</link>
      <description>Have you allowed VirtualBox to pass-through all CPU features and cores?&lt;BR /&gt;&lt;BR /&gt;Regards&lt;BR /&gt;Viktor</description>
      <pubDate>Wed, 23 Jun 2010 10:25:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792596#M2524</guid>
      <dc:creator>vrennert</dc:creator>
      <dc:date>2010-06-23T10:25:44Z</dc:date>
    </item>
    <item>
      <title>IPP Matrix functions</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792597#M2525</link>
      <description>Obviously my processor doesn't support virtualization (Pentium Dual-Core T4200)</description>
      <pubDate>Wed, 23 Jun 2010 10:55:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792597#M2525</guid>
      <dc:creator>shivany</dc:creator>
      <dc:date>2010-06-23T10:55:41Z</dc:date>
    </item>
    <item>
      <title>IPP Matrix functions</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792598#M2526</link>
      <description>What is the basis of your conclusion that the IPP version is slower? Your example involves such a small number of calculations that other factors (loading a DLL / shared library, for example) may control speed.</description>
      <pubDate>Wed, 23 Jun 2010 12:51:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792598#M2526</guid>
      <dc:creator>mecej4</dc:creator>
      <dc:date>2010-06-23T12:51:52Z</dc:date>
    </item>
    <item>
      <title>IPP Matrix functions</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792599#M2527</link>
      <description>It's not only a function called once or twice, I actually launched valgrind to generate a call graph with the callgrind tool and it showed that the rate of this function (rot_update) is more important with the IPP code than in the original one even if there are less calls to it.&lt;BR /&gt;&lt;BR /&gt;Here are the call graphs below, the first one without and the second one with IPP as you can guess :&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;IMG src="http://img145.imageshack.us/img145/1612/matrices.jpg" /&gt;</description>
      <pubDate>Wed, 23 Jun 2010 13:09:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792599#M2527</guid>
      <dc:creator>shivany</dc:creator>
      <dc:date>2010-06-23T13:09:46Z</dc:date>
    </item>
    <item>
      <title>IPP Matrix functions</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792600#M2528</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello, &lt;/P&gt;&lt;P&gt;I think Viktor want to check if all CPU features are enabled in the Virtual Box, it needs make sure the optimized CPU code were actually used. You can check with ippsGetLibVersion() function, please see "Example 3-1 Using the Function ippsGetLibVersion" in the ippsman.pdf manual on this fucntions. If it used some PX code, actually the code did not use optimized function. &lt;/P&gt;&lt;P&gt;3x3 matrix computation is very fast operation, compared with the function call overhead. If possibly, you can combine the matrix multiplication into one function call with "matrix array operation". &lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Chao&lt;/P&gt;</description>
      <pubDate>Fri, 25 Jun 2010 02:34:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792600#M2528</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2010-06-25T02:34:19Z</dc:date>
    </item>
    <item>
      <title>IPP Matrix functions</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792601#M2529</link>
      <description>Hello,&lt;BR /&gt;&lt;BR /&gt;Here is the output of the printf showed in the 3-1 example :&lt;BR /&gt;&lt;BR /&gt;libippsv8.so.6.1 6.1 build 137.36 6.1.137.827&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 25 Jun 2010 12:25:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792601#M2529</guid>
      <dc:creator>shivany</dc:creator>
      <dc:date>2010-06-25T12:25:32Z</dc:date>
    </item>
    <item>
      <title>IPP Matrix functions</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792602#M2530</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I didn't manage to solve the problem yet, does anyone has an idea?&lt;BR /&gt;&lt;BR /&gt;Allan</description>
      <pubDate>Wed, 07 Jul 2010 10:12:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792602#M2530</guid>
      <dc:creator>shivany</dc:creator>
      <dc:date>2010-07-07T10:12:39Z</dc:date>
    </item>
    <item>
      <title>IPP Matrix functions</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792603#M2531</link>
      <description>&lt;P&gt;Allan, &lt;/P&gt;&lt;P&gt;do you have your code implementation? We can have a benchmark here to check the performance (please reply with private message if you do not publish the code). &lt;/P&gt;&lt;P&gt;when looking at the code, in order to call the function, it adds some additional computation on data type conversion, it convert the data to single float first, then converting it back. &lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;for(i=0; i&amp;lt;9; ++i) &lt;/P&gt;&lt;P&gt;matR&lt;I&gt; = R&lt;I&gt;; &lt;/I&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;for(i=0; i&amp;lt;9; ++i){ &lt;/P&gt;&lt;P&gt;Rnew&lt;I&gt; = matRNew&lt;I&gt;; &lt;/I&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;} &lt;/P&gt;&lt;P&gt;If the matrix is small (fast in function call), such overheard may overcome the benefit of calling IPP functions. &lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Chao&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jul 2010 02:07:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Matrix-functions/m-p/792603#M2531</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2010-07-08T02:07:17Z</dc:date>
    </item>
  </channel>
</rss>

