<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic BVH4 traverser &amp;quot;optimization&amp;quot; in Intel® Embree Ray Tracing Kernels</title>
    <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/BVH4-traverser-quot-optimization-quot/m-p/803375#M133</link>
    <description>I've noticed some not needed "else" for leaf nodes. This isbvh4_traverser.cpp: 122. Interestingly, Composer 2011 on Windows seems to have a problem optimizing that. I get 2-3% speed increase on my i7-920, after removing that "else". I tried something similar for the occlusion rays, but doesn't seem to make a difference:&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;[bash]while (true) {
      if (__builtin_expect(stackPtr == 0, false)) break;
      stackPtr--;

      cur = stack[stackPtr];
next:
      /*! this is an inner node */
      if (__builtin_expect(cur &amp;gt;= 0, true))
      {
        /*! single ray intersection with 4 boxes */
        const BVH4&lt;TRIANGLE4&gt;::Node&amp;amp; node = bvh-&amp;gt;node(nodes,cur);
        ssef tNearX = (norg.x + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+nearX)) * rdir.x;
        ssef tNearY = (norg.y + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+nearY)) * rdir.y;
        ssef tNearZ = (norg.z + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+nearZ)) * rdir.z;
        ssef tNear = max(tNearX,tNearY,tNearZ,rayNear);
        ssef tFarX = (norg.x + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+farX)) * rdir.x;
        ssef tFarY = (norg.y + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+farY)) * rdir.y;
        ssef tFarZ = (norg.z + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+farZ)) * rdir.z;
        ssef tFar = min(tFarX,tFarY,tFarZ,rayFar);
        size_t _hit = movemask(tNear &amp;lt;= tFar);

        /*! push hit nodes onto stack */
        if (__builtin_expect(_hit == 0, true)) continue;
        size_t r = __bsf(_hit); _hit = __btc(_hit,r);
        stack[stackPtr] = cur = node.child&lt;R&gt;;
        if (__builtin_expect(_hit == 0, true)) goto next;
        r = __bsf(_hit); _hit = __btc(_hit,r);
        stack[++stackPtr] = cur = node.child&lt;R&gt;;
        if (__builtin_expect(_hit == 0, true)) goto next;
        r = __bsf(_hit); _hit = __btc(_hit,r);
        stack[++stackPtr] = cur = node.child&lt;R&gt;;
        if (__builtin_expect(_hit == 0, true)) goto next;
        r = __bsf(_hit); _hit = __btc(_hit,r);
        stack[++stackPtr] = cur = node.child&lt;R&gt;;
        goto next;
      }

      /*! this is a leaf node */
      {
        cur ^= 0x80000000;
        const size_t ofs = size_t(cur) &amp;gt;&amp;gt; 5;
        const size_t num = size_t(cur) &amp;amp; 0x1F;
        for (size_t i=ofs; i&lt;OFS&gt;triangles&lt;I&gt;.occluded(ray))
            return true;
      }
    }[/bash] &lt;/I&gt;&lt;/OFS&gt;&lt;/R&gt;&lt;/R&gt;&lt;/R&gt;&lt;/R&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 22 Feb 2012 23:14:36 GMT</pubDate>
    <dc:creator>P_V__Hariprasad</dc:creator>
    <dc:date>2012-02-22T23:14:36Z</dc:date>
    <item>
      <title>BVH4 traverser "optimization"</title>
      <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/BVH4-traverser-quot-optimization-quot/m-p/803375#M133</link>
      <description>I've noticed some not needed "else" for leaf nodes. This isbvh4_traverser.cpp: 122. Interestingly, Composer 2011 on Windows seems to have a problem optimizing that. I get 2-3% speed increase on my i7-920, after removing that "else". I tried something similar for the occlusion rays, but doesn't seem to make a difference:&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;[bash]while (true) {
      if (__builtin_expect(stackPtr == 0, false)) break;
      stackPtr--;

      cur = stack[stackPtr];
next:
      /*! this is an inner node */
      if (__builtin_expect(cur &amp;gt;= 0, true))
      {
        /*! single ray intersection with 4 boxes */
        const BVH4&lt;TRIANGLE4&gt;::Node&amp;amp; node = bvh-&amp;gt;node(nodes,cur);
        ssef tNearX = (norg.x + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+nearX)) * rdir.x;
        ssef tNearY = (norg.y + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+nearY)) * rdir.y;
        ssef tNearZ = (norg.z + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+nearZ)) * rdir.z;
        ssef tNear = max(tNearX,tNearY,tNearZ,rayNear);
        ssef tFarX = (norg.x + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+farX)) * rdir.x;
        ssef tFarY = (norg.y + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+farY)) * rdir.y;
        ssef tFarZ = (norg.z + *(ssef*)((const char*)nodes+BVH4&lt;TRIANGLE4&gt;::offsetFactor*size_t(cur)+farZ)) * rdir.z;
        ssef tFar = min(tFarX,tFarY,tFarZ,rayFar);
        size_t _hit = movemask(tNear &amp;lt;= tFar);

        /*! push hit nodes onto stack */
        if (__builtin_expect(_hit == 0, true)) continue;
        size_t r = __bsf(_hit); _hit = __btc(_hit,r);
        stack[stackPtr] = cur = node.child&lt;R&gt;;
        if (__builtin_expect(_hit == 0, true)) goto next;
        r = __bsf(_hit); _hit = __btc(_hit,r);
        stack[++stackPtr] = cur = node.child&lt;R&gt;;
        if (__builtin_expect(_hit == 0, true)) goto next;
        r = __bsf(_hit); _hit = __btc(_hit,r);
        stack[++stackPtr] = cur = node.child&lt;R&gt;;
        if (__builtin_expect(_hit == 0, true)) goto next;
        r = __bsf(_hit); _hit = __btc(_hit,r);
        stack[++stackPtr] = cur = node.child&lt;R&gt;;
        goto next;
      }

      /*! this is a leaf node */
      {
        cur ^= 0x80000000;
        const size_t ofs = size_t(cur) &amp;gt;&amp;gt; 5;
        const size_t num = size_t(cur) &amp;amp; 0x1F;
        for (size_t i=ofs; i&lt;OFS&gt;triangles&lt;I&gt;.occluded(ray))
            return true;
      }
    }[/bash] &lt;/I&gt;&lt;/OFS&gt;&lt;/R&gt;&lt;/R&gt;&lt;/R&gt;&lt;/R&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/TRIANGLE4&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 22 Feb 2012 23:14:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/BVH4-traverser-quot-optimization-quot/m-p/803375#M133</guid>
      <dc:creator>P_V__Hariprasad</dc:creator>
      <dc:date>2012-02-22T23:14:36Z</dc:date>
    </item>
  </channel>
</rss>

