<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic AffineTransform Variant in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/AffineTransform-Variant/m-p/829695#M5440</link>
    <description>Nevermind ... found a way to get force full vectorisation anyway. Works well.</description>
    <pubDate>Thu, 05 May 2011 21:05:12 GMT</pubDate>
    <dc:creator>kramulous</dc:creator>
    <dc:date>2011-05-05T21:05:12Z</dc:date>
    <item>
      <title>AffineTransform Variant</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/AffineTransform-Variant/m-p/829694#M5439</link>
      <description>Hi,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt; I've been speeding up some rendering code of mine and have reduced the problem to a function that takes 98% of the time. &lt;PRE&gt;[cpp]    ippsMulC_32f(dataset.xpoints, a, temp1, n);
    ippsAddProductC_32f(dataset.ypoints, b, temp1, n);
    ippsAddProductC_32f(dataset.zpoints, c, temp1, n);
    ippsAddC_32f_I(d, temp1, n);
    
    ippsMulC_32f(dataset.xpoints, e, temp2, n);
    ippsAddProductC_32f(dataset.ypoints, f, temp2, n);
    ippsAddProductC_32f(dataset.zpoints, g, temp2, n);
    ippsAddC_32f_I(h, temp2, n);
    
    ippsMulC_32f(dataset.xpoints, i, temp3, n);
    ippsAddProductC_32f(dataset.ypoints, j, temp3, n);
    ippsAddProductC_32f(dataset.zpoints, k, temp3, n);
    ippsAddC_32f_I(l, temp3, n);
    ippsDivCRev_32f_I(1.0, temp3, n);
    
    
    for (int p = 0; p&lt;N&gt; = ca + ca*temp1&lt;P&gt;*vd*temp3&lt;/P&gt;&lt;P&gt;;
        ytrans&lt;/P&gt;&lt;P&gt; = cb - cb*temp2&lt;/P&gt;&lt;P&gt;*vda*temp3&lt;/P&gt;&lt;P&gt;;
    }[/cpp]&lt;/P&gt;&lt;/N&gt;&lt;/PRE&gt; &lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Eventually I have to get rid of the temporary variables (temp1,temp2, temp3) ... memory requirements will not allow them for too much further.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Now, I see the closest contender for what I need is the "ippmAffineTransform3DH_mva_32f",but it does:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;X' = t&lt;SUB&gt;11&lt;/SUB&gt;*X + t&lt;SUB&gt;12&lt;/SUB&gt;*Y + t&lt;SUB&gt;13&lt;/SUB&gt;*Z + t&lt;SUB&gt;14&lt;/SUB&gt;&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;Y' = t&lt;SUB&gt;21&lt;/SUB&gt;*X + t&lt;SUB&gt;22&lt;/SUB&gt;*Y + t&lt;SUB&gt;23&lt;/SUB&gt;*Z + t&lt;SUB&gt;24&lt;/SUB&gt;&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;Z' = t&lt;SUB&gt;31&lt;/SUB&gt;*X + t&lt;SUB&gt;32&lt;/SUB&gt;*Y + t&lt;SUB&gt;33&lt;/SUB&gt;*Z + t&lt;SUB&gt;34&lt;/SUB&gt;&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;W = t&lt;SUB&gt;41&lt;/SUB&gt;*X + t&lt;SUB&gt;42&lt;/SUB&gt;*Y + t&lt;SUB&gt;43&lt;/SUB&gt;*Z + t&lt;SUB&gt;44&lt;/SUB&gt;&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;X' = X'/W&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;Y' = Y'/W&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;Z' = Z'/W&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;But I need&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;X' = t&lt;SUB&gt;11&lt;/SUB&gt;*X + t&lt;SUB&gt;12&lt;/SUB&gt;*Y + t&lt;SUB&gt;13&lt;/SUB&gt;*Z + t&lt;SUB&gt;14&lt;/SUB&gt;&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;Y' = t&lt;SUB&gt;21&lt;/SUB&gt;*X + t&lt;SUB&gt;22&lt;/SUB&gt;*Y + t&lt;SUB&gt;23&lt;/SUB&gt;*Z + t&lt;SUB&gt;24&lt;/SUB&gt;&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;W = t&lt;SUB&gt;41&lt;/SUB&gt;*X + t&lt;SUB&gt;42&lt;/SUB&gt;*Y + t&lt;SUB&gt;43&lt;/SUB&gt;*Z + t&lt;SUB&gt;44&lt;/SUB&gt;,&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;and then the following vector transformation:&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;X' = X'/W&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;Y' = Y'/W&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;So I guess my question is this, are there any internal optimisations that will flagt&lt;SUB&gt;31,&lt;/SUB&gt;t&lt;SUB&gt;32,&lt;/SUB&gt;t&lt;SUB&gt;33&lt;/SUB&gt;and t&lt;SUB&gt;34&lt;/SUB&gt;&lt;/P&gt;&lt;META http-equiv="content-type" content="text/html; charset=utf-8" /&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;as zero and not include them in the calculation?&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;It would be really, really nice to sort this last bit out so I can get balanced parallel (multi socket/core) and full vectorisation.&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;Currently getting 2 billion points rendering at 20 frames per second at 9600x4800 resolution (20x1920x1200). The interaction is just a fraction too slow for fluid manipulation. By the end of the year the resolution will expected to increase 6x more so milking every bit of performance is essential.&lt;/P&gt;&lt;P MSHELP="http://www.microsoft.com/MSHelp/"&gt;Pointers from the experts?&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 04 May 2011 12:41:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/AffineTransform-Variant/m-p/829694#M5439</guid>
      <dc:creator>kramulous</dc:creator>
      <dc:date>2011-05-04T12:41:42Z</dc:date>
    </item>
    <item>
      <title>AffineTransform Variant</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/AffineTransform-Variant/m-p/829695#M5440</link>
      <description>Nevermind ... found a way to get force full vectorisation anyway. Works well.</description>
      <pubDate>Thu, 05 May 2011 21:05:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/AffineTransform-Variant/m-p/829695#M5440</guid>
      <dc:creator>kramulous</dc:creator>
      <dc:date>2011-05-05T21:05:12Z</dc:date>
    </item>
  </channel>
</rss>

