I am building an application on Windows targeting arch:AVX2, but how can I be sure embree is using AVX2 methods? Will the rtcIntersect8 throw some kind of error if used without AVX2 enabled? What is better - to use the rtcIntersect8 methods or the stream methods for performance? Switching between 1, 4 or 8 ray packets seems to not change my performance - that's why my questions.
Thanks for any help.
Embree will internally scan the CPU flags to determine which ISA sets are supported. If Embree finds the AVX2 flag it will automatically enable the relevant AVX2-optimized code paths. The user/application does not need to do anything.
Which rtcIntersect variant should be used depends on the application and in particular the underlying rendering framework. If the framework operates on ray packets then the rtcIntersect4/8 are the preferred choice. If it is build just for tracing one ray at a time the single ray rtcIntersect call is preferred choice. If the framework even supports either streams of single rays or packets you can give the stream rtcIntersect variants a try to get some more performance. For streams of coherent rays (e.g. for primary visibility or hard shadows) you can even set the stream COHERENT flag to get maximum performance for these kind of coherent ray distributions.
Thank you for your reply. The documentation isn't clear - should I only be using rtcIntersect4/8 if the rays are coherent? If I am calculating ambient occlusion, say, should I simply use a stream of single rays? I can generate my rays into a collection of packets of 8, or a stream of single rays - does either matter? In the case of ambient occlusion the rays certainly are *not* coherent. This is the same as doing global illumination calcuations using indirect sampling.
Sorry about the doc being not clear. rtcIntersect4/8/16 can be used for all kinds of rays (coherent and incoherent). Embree will detect internally whether the rays are incoherent and use automatically a different code path top optimize performance.
If you really have vastly incoherent ambient occlusion rays (without a length restriction) in a somewhat complex scene (>10M primitives) it mostly doesn't matter though what interface you will send the rays to Embree. There might be a +/- 10-15% performance difference but that's it.
I personally would make the interface choice depend on the number of AO rays you can shoot in one batch in your rendering framework:
- N <= 8 rays per batch => generate and trace a packet by rtcIntersect8/rtcOccluded8
- N between 16 and 64 or higher => trace a stream of single rays by rtcIntersect1M/rtcOccluded1M
Between the two choose whatever is easier to integrate in your framework.
If your framework can use "any-hit" kernels for AO rays then I would definitely go for rtcOccluded* instead of rtcIntersect*.
Hope this helps.