Scene flags having no effect on memory consumption

Michael_C_9 · ‎05-30-2016

The documentation mentions that the accelerated structure scene flags (e.g RTC_SCENE_COMPACT, RTC_SCENE_HIGH_QUALITY) may be ignored by the implementation. I need to reduce the memory consumed by the bounding volume hierarchy but these flags have zero effect on the memory consumed. What exactly determines whether or not they are used?

BenthinC_Intel · ‎05-30-2016

Could you give us some additional info here as RTC_SCENE_COMPACT should definitely reduce memory consumption.

Could you pass "verbose=2" to the Embree initialization flags and test whether RTC_SCENE_COMPACT reduces the memory consumption (in the cmd line output look for "used = ... MB").
Also there might be a difference in allocated virtual address space and address space really used by Embree?
How much memory does your app take in relation to the internal data structures generated by Embree?

A few things to further reduce memory consumption is to share the vertex arrays with the application, use quads instead of triangles (if the geometry is mostly based on quads anyway) etc.

Hope this helps.

Michael_C_9 · ‎05-30-2016

I'm using user defined geometries. Below I've tested for 1600 objects. Looking at the data below they seem identical regardless of scene flags which shouldn't be the case, right?.

RTC_SCENE_HIGH_QUALITY

building BVH4<object> using avx::BVH4BuilderSAH ... [DONE] 1.92308ms, 0.831997 Mprim/s, 0.0481893 GB/s

primitives = 1600, vertices = 0

sah = 7.6265 (6.1196 + 1.5068), depth = 6

used = 0.1 MB, perPrimitive = 57.7 B

alignedNodes = 621 (89.4% filled) (0.1 MB) (86.1% of total)

leaves = 1600 (0.0 MB) (13.9% of total)(100.0% used)

vertices = 0 (0.0 MB) (0.0% of total) (75.0% used)

allocated = 0.12MB, reserved = 0.12MB, used = 0.09MB (78.10%), wasted = 0.00MB (3.40%), free = 0.00MB (3.40%)

used blocks = [12288, 16320, 16320] [98304, 102336, 102336] [END]

free blocks = [END]

created scene intersector

accels[0]

intersector1 = avx2::BVH4VirtualIntersector1

intersector4 = avx2::BVH4VirtualIntersector4Chunk

intersector8 = avx2::BVH4VirtualIntersector8Chunk

selected scene intersector

intersector1 = avx2::BVH4VirtualIntersector1

intersector8 = avx2::BVH4VirtualIntersector8Chunk

RTC_SCENE_COMPACT

building BVH4<object> using avx::BVH4BuilderSAH ... [DONE] 2.78401ms, 0.57471 Mprim/s, 0.0332872 GB/s
primitives = 1600, vertices = 0
sah = 7.6265 (6.1196 + 1.5068), depth = 6
used = 0.1 MB, perPrimitive = 57.7 B
alignedNodes = 621 (89.4% filled) (0.1 MB) (86.1% of total)
leaves = 1600 (0.0 MB) (13.9% of total)(100.0% used)
vertices = 0 (0.0 MB) (0.0% of total) (75.0% used)
allocated = 0.12MB, reserved = 0.12MB, used = 0.09MB (78.10%), wasted = 0.00MB (3.40%), free = 0.00MB (3.40%)
used blocks = [12288, 16320, 16320] [98304, 102336, 102336] [END]
free blocks = [END]
created scene intersector
accels[0]
intersector1 = avx2::BVH4VirtualIntersector1
intersector4 = avx2::BVH4VirtualIntersector4Chunk
intersector8 = avx2::BVH4VirtualIntersector8Chunk
selected scene intersector
intersector1 = avx2::BVH4VirtualIntersector1
intersector8 = avx2::BVH4VirtualIntersector8Chunk

Scene is created using

RTCSceneFlags sflags = RTC_SCENE_STATIC | RTC_SCENE_HIGH_QUALITY;

RTCAlgorithmFlags aflags = RTC_INTERSECT1 | RTC_INTERSECT8;

scene = rtcDeviceNewScene(device, sflags, aflags);

Other information

Embree Ray Tracing Kernels 2.9.0 (Mar 10 2016)
Compiler : Intel Compiler 16.0.1
Build : Release
Platform : Mac OS X (64bit)
CPU : Haswell (GenuineIntel)
Threads : 4
ISA : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2
Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2
MXCSR : FTZ=1, DAZ=1
Config
Threads : default
ISA : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2
Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 (supported)
SSSE3 SSE4.2 AVX AVX2 (compile time enabled)
Features: intersection_filter bufferstride
Tasking : TBB4.3 TBB_header_interface_8001 TBB_lib_interface_9000

general:
build threads = 0
verbosity = 2
triangles:
accel = default
builder = default
traverser = default
replications = 2
motion blur triangles:
accel = default
builder = default
traverser = default
quads:
accel = default
builder = default
traverser = default
motion blur quads:
accel = default
builder = default
traverser = default
line segments:
accel = default
builder = default
traverser = default
motion blur line segments:
accel = default
builder = default
traverser = default
hair:
accel = default
builder = default
traverser = default
replications = 3
motion blur hair:
accel = default
subdivision surfaces:
accel = default
object_accel:
min_leaf_size = 1
max_leaf_size = 1
object_accel_mb:
min_leaf_size = 1
max_leaf_size = 1

Michael_C_9 · ‎05-30-2016

So it seems like the flags are only for the bounding volume hierarchy generated for meshes? Is there a way to make changes to the scene bounding volume hierarchy and the quality thereof? What is the default quality setting in that case?

BenthinC_Intel · ‎05-31-2016

Correct, the flags only affect BVHs for meshes. For user geometries BVHs we use the standard binning-based BVH builder as further BVH quality optimizations are very hard to do (we don't know what's in the user geometry). I guess the number of bytes per user geometry is rather small as otherwise the BVH over all user geometries won't be the bottleneck in terms of memory consumption, right?

Michael_C_9 · ‎05-31-2016

The large memory consumption turned out to be an unnecessarily large amount of calls to rtcNewUserGeometry that scaled with the amount of nodes in the BVH which is why I thought that it was related to the BVH.

Thanks for the help!