Intel® Embree Ray Tracing Kernels
Discussion forum on the open source ray tracing kernels for fast photo-realistic rendering on Intel® CPU(s)

Need help with an optimization

P_V__Hariprasad
Beginner
430 Views
I've changed the node index to be 64 bit, and this caused a change inStackItem, which was relying on some magic code that I didn't quite understand. This is what I came up with, but probably someone can do this better:
[bash] struct StackItem { /*! Copy operator */ StackItem& operator=(const StackItem& other) { all = other.all; return *this; } /*! Sort a stack item by distance. */ static __forceinline void sort(StackItem& a, StackItem& b) { if (a.dist < b.dist) std::swap(a,b); } union { struct { float dist; float pad; int64 ofs; }; __m128 all; }; }; /*! Sort 3 stack items. */ __forceinline void sort(StackItem& a, StackItem& b, StackItem& c) { __m128 s1 = a.all; __m128 s2 = b.all; __m128 s3 = c.all; if (_mm_comilt_ss(s2, s1)) std::swap(s2, s1); if (_mm_comilt_ss(s3, s2)) std::swap(s3, s2); if (_mm_comilt_ss(s2, s1)) std::swap(s2, s1); a.all = s1; b.all = s2; c.all = s3; } /*! Sort 4 stack items. */ __forceinline void sort(StackItem& a, StackItem& b, StackItem& c, StackItem& d) { __m128 s1 = a.all; __m128 s2 = b.all; __m128 s3 = c.all; __m128 s4 = d.all; if (_mm_comilt_ss(s2, s1)) std::swap(s2,s1); if (_mm_comilt_ss(s4, s3)) std::swap(s4,s3); if (_mm_comilt_ss(s3, s1)) std::swap(s3,s1); if (_mm_comilt_ss(s4, s2)) std::swap(s4,s2); if (_mm_comilt_ss(s3, s2)) std::swap(s3,s2); a.all = s1; b.all = s2; c.all = s3; d.all = s4; }[/bash]
0 Kudos
2 Replies
Max_Liani
Beginner
430 Views
Very late reply, I know, but why do you want a full 64 bit id for the inernal nodes, a full 32 bits id is already 4 billions nodes... considering each node have 4 pointers to leaves, that's already 16 billions of leaves, each of them could contain 4 triangles or more... that's an aweful lot of stuff no modern workstation can hold.

Original code packs other data to higher bits of the 32 bits "id". That is a bit short! But if you change Node::child[4] to be a 64 bit variable you can use the full 32 bits for the ID and bits 33 to 64 to store the extra data (node type, triangle count, and more).
I did similar changes of what you did but with that i described above. I use the number as actual ID instead than memory offset. Once you have your nodes aligned to cahce lines, the offset trick seems to loose most of its advantage, helping you gain back a few extra precious bits (and making the code simpler).

I have something on these lines now:

typedef uint32 OcclStackItem;
  struct StackItem
  {
    /*! Copy operator */
    StackItem& operator=(const StackItem& other) { all = other.all; return *this; }

    /*! Sort a stack item by distance. */
    static __forceinline void sort(StackItem& a, StackItem& b) { if (a.all < b.all) std::swap(a,b); }

    union {
      struct { uint32 ofs; float dist; };
      int64 all;
    };
  };

  /*! Sort 3 stack items. */
  __forceinline void sort(StackItem& a, StackItem& b, StackItem& c)
  {
    int64 s1 = a.all;
    int64 s2 = b.all;
    int64 s3 = c.all;
    if (s2 < s1) std::swap(s2,s1);
    if (s3 < s2) std::swap(s3,s2);
    if (s2 < s1) std::swap(s2,s1);
    a.all = s1;
    b.all = s2;
    c.all = s3;
  }

  /*! Sort 4 stack items. */
  __forceinline void sort(StackItem& a, StackItem& b, StackItem& c, StackItem& d)
  {
    int64 s1 = a.all;
    int64 s2 = b.all;
    int64 s3 = c.all;
    int64 s4 = d.all;
    if (s2 < s1) std::swap(s2,s1);
    if (s4 < s3) std::swap(s4,s3);
    if (s3 < s1) std::swap(s3,s1);
    if (s4 < s2) std::swap(s4,s2);
    if (s3 < s2) std::swap(s3,s2);
    a.all = s1;
    b.all = s2;
    c.all = s3;
    d.all = s4;
  }
0 Kudos
P_V__Hariprasad
Beginner
430 Views
I'm using the same encoding as the original code, just have more space for nodes. It's probably huge for the current standards, but what other use these bits can have? They are not enough to have even 1 bit per triangle, so if assign some kind of "property", it will work only if all triangles in the node have it. It's probably safe to assume that the chance for this is pretty big, but stil...
0 Kudos
Reply