<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic AVX + Triangle8 slow in Intel® Embree Ray Tracing Kernels</title>
    <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/AVX-Triangle8-slow/m-p/934606#M199</link>
    <description>&lt;P&gt;Hi All&lt;/P&gt;
&lt;P&gt;Unfortunately I've no machine with AVX and can't debug. Users tell me that with AVX render time is in 3 times slower (bvh4 + triangle8) are used. If I set bvh4 + triangle4 then render time with AVX is approx same as SSSE3. Any hint/advice is very appreciated&lt;/P&gt;
&lt;P&gt;Thanks&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 27 Jun 2013 05:03:06 GMT</pubDate>
    <dc:creator>theigors</dc:creator>
    <dc:date>2013-06-27T05:03:06Z</dc:date>
    <item>
      <title>AVX + Triangle8 slow</title>
      <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/AVX-Triangle8-slow/m-p/934606#M199</link>
      <description>&lt;P&gt;Hi All&lt;/P&gt;
&lt;P&gt;Unfortunately I've no machine with AVX and can't debug. Users tell me that with AVX render time is in 3 times slower (bvh4 + triangle8) are used. If I set bvh4 + triangle4 then render time with AVX is approx same as SSSE3. Any hint/advice is very appreciated&lt;/P&gt;
&lt;P&gt;Thanks&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jun 2013 05:03:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/AVX-Triangle8-slow/m-p/934606#M199</guid>
      <dc:creator>theigors</dc:creator>
      <dc:date>2013-06-27T05:03:06Z</dc:date>
    </item>
    <item>
      <title>Are you using Embree as is,</title>
      <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/AVX-Triangle8-slow/m-p/934607#M200</link>
      <description>&lt;P&gt;Are you using Embree as is, or did you extract the ray traversal kernels to your application? If the latter is the case, you have to be carefull that the __mm256_zeroupper() intrinsic is active at the beginning and end of the Embree traversal kernel. Otherwise there will be a performance penalty, if the user code it NOT compiled with AVX enabled.&lt;/P&gt;
&lt;P&gt;Further, the bvh4.triangle8 will anyway only give you a small performance benefit (if any) over using the bvh4.triangle4. Thus the best workaround is to simply use the bvh4.triangle4.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jun 2013 12:12:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/AVX-Triangle8-slow/m-p/934607#M200</guid>
      <dc:creator>SvenW_Intel</dc:creator>
      <dc:date>2013-06-27T12:12:33Z</dc:date>
    </item>
    <item>
      <title>Hi, Sven</title>
      <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/AVX-Triangle8-slow/m-p/934608#M201</link>
      <description>&lt;P&gt;Hi, Sven&lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt; Are you using Embree as is, or did you extract the ray traversal kernels to your application? If the latter is the case, you have to be carefull that the __mm256_zeroupper() intrinsic is active at the beginning and end of the Embree traversal kernel. Otherwise there will be a performance penalty, if the user code it NOT compiled with AVX enabled.&amp;lt;&amp;lt;&lt;/P&gt;
&lt;P&gt;Yes, I've extracted kernels (btw it was much easier than I expected). I've built with __AVX__ (and __SSE4_2__), so __mm256_zeroupper() should be active. I can use Triangle8 with SSSE3 etc - no time penalty but also no speedup. I'll debug on user's side (it needs time) and let you know. &lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt;Further, the bvh4.triangle8 will anyway only give you a small performance benefit (if any) over using the bvh4.triangle4. Thus the best workaround is to simply use the bvh4.triangle4.&amp;lt;&amp;lt;&lt;/P&gt;
&lt;P&gt;Ops! It's really surprising, I expected like 1.5 times speedup. If possible tell me why so?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thx for your help&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jun 2013 14:01:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/AVX-Triangle8-slow/m-p/934608#M201</guid>
      <dc:creator>theigors</dc:creator>
      <dc:date>2013-06-28T14:01:34Z</dc:date>
    </item>
    <item>
      <title>Hi, Sven</title>
      <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/AVX-Triangle8-slow/m-p/934609#M202</link>
      <description>&lt;P&gt;Hi, Sven&lt;/P&gt;
&lt;P&gt;1) Yes, I've extracted rt cores and built with __AVX__ (and __SSE4_2__), so __mm256_zeroupper() should be active. If I use Triangle8 with SSSE3 etc - no time penalty, approx same speed as for Triangle4. I'll debug on user's side (it needs time) and let you know.&lt;/P&gt;
&lt;P&gt;2) It's a surprise/unexpected for me that Triangle8 is not significantly faster! If possible - explain why&lt;/P&gt;
&lt;P&gt;Thx for your help&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jun 2013 14:09:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/AVX-Triangle8-slow/m-p/934609#M202</guid>
      <dc:creator>theigors</dc:creator>
      <dc:date>2013-06-28T14:09:12Z</dc:date>
    </item>
  </channel>
</rss>

