Intel® Embree Ray Tracing Kernels
Discussion forum on the open source ray tracing kernels for fast photo-realistic rendering on Intel® CPU(s)

AVX crash

Igor_Igor
Beginner
587 Views

Hello

OSX 9.5, Embree 2.3.3 built with icc 13. Crashes on machine with AVX, same scene renders Ok with ssse3, sse4.1 etc. It's machine of my partner, mine has no AVX. So I added a log and see:  bvh tree/nodes are Ok, but invalid node is pop from stack. I've fixed it in file Bvh8_intersector1.cpp, line 98

#if defined(__AVX2__)
          const avxf tNear = maxi(maxi(tNearX,tNearY),maxi(tNearZ,ray_near));
          const avxf tFar  = mini(mini(tFarX ,tFarY ),mini(tFarZ ,ray_far ));
          const avxb vmask = cast(tNear) > cast(tFar);
          size_t mask = movemask(vmask)^0xff;
#else
          const avxf tNear = max(tNearX,tNearY,tNearZ,ray_near);
          const avxf tFar  = min(tFarX ,tFarY ,tFarZ ,ray_far);
          const avxb vmask = tNear <= tFar;
//II          size_t mask = movemask(vmask);
          size_t mask = movemask(vmask) & (BVH8::N - 1);
#endif

Without it the "mask" is invalid, higher 2 bytes are non-zero.  It happens randomly like once per 100K rays. Please advice a better solution.

Thx

0 Kudos
5 Replies
SvenW_Intel
Moderator
587 Views

That sound weird. Only the lower 8 bits can be set in the mask. First, your fix is not correct, you have to and the mask with 0xFF to clear out the invalid bits. Could you try this and tell if the problem goes away?

Could you also send us a ray stream log including geometry of the model is possible? To do so enable RTCORE_ENABLE_RAYSTREAM_LOGGER in cmake and run your application. This will store the geometry, and the rays traversed onto disk. Best do this with your fix enabled such that the appliation does not yet crash. Then disable the RTCORE_ENABLE_RAYSTREAM_LOGGER again in cmake and try to replay the logs using the retrace application. If the error occurs during replay, please send us the logs for further debugging.

0 Kudos
Igor_Igor
Beginner
587 Views

Hello and sorry for delay (no AVX machine in my hands)

1) I've set mask to 0xFF and got crash back. Checked again, movemask works as it should, sets bits in low byte only

2) After more logging I see the problem is here

// stack1 0xf55e460, 0xf55a460, 0xf5559a0, 0xf554dc0
sort(stackPtr[-1],stackPtr[-2],stackPtr[-3],stackPtr[-4]);
// stack2 0xf554dc0, 0x0, 0xf55e460, 0xf5559a0

Then I found :FIXME

  template<typename T>
    struct StackItemInt32
  {
    __forceinline static void swap2(StackItemInt32<T>& a, StackItemInt32<T>& b) {
//II #if defined(__AVX__) && defined(__INTEL_COMPILER) // FIXME: works only if sizeof(T) is 8 bytes large
#if 0
      /* use sse registers to copy stack items */
      ssef sse_a = load4f(&a);
      ssef sse_b = load4f(&b);
      store4f(&a,sse_b);
      store4f(&b,sse_a);
#else
      StackItemInt32<T> t = b; b = a; a = t;
#endif
    }

It should not work in 32-bit becase swapped size > sizeof. When I set #if 0 - renders fine, no crash

Thx

 

 

 

0 Kudos
SvenW_Intel
Moderator
587 Views

Hi Igor,

some questions to better reproduce the issue:

Is the problem only occuring in Debug or Release mode?

Is the problem occuring in 32 bit mode or 64 bit mode?

Do the Embree regression tests run through (execute ./regression)?

We looked again into this code sequence, and in 64 bit mode the StackItemInt32<NodeRef> struct is 16 bytes large. Consequently, the code sequence using the SSE loads should by correct.

I could imagine that you ran into some ICC compiler bug. Can you easily upgrade to the latest ICC and rerun the experiment?

Regards,

Sven

0 Kudos
Igor_Igor
Beginner
587 Views

Hi Sven

1) Happens in both: Debug and Release

2) I've tested in 32-bits only, with fix both 32 and 64 works fine,

3) Sorry, I don;t know what is a regression test

4) I use ICC 13.0 for OSX (registered user). I'm not very happy with this build but now I can't upgrade to latest (no $)

Thx for your support

0 Kudos
SvenW_Intel
Moderator
587 Views

Ok, the issue only occuring in the 32 bit version makes sense. We will fix this in the next release.

0 Kudos
Reply