Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Ipp rendering domain documentation request

Kaan_Gök
Beginner
448 Views
Hello again;
I could not find enough info about the kdtree acceleration structures on the product documentation.

For example the meaning of the members of the following structure
typedef struct KDTreeNode
{
Ipp32s flag_k_ofs;
union _tree_data{
Ipp32f split;
Ipp32s items;
}tree_data;
}IpprKDTreeNode

I need this info to decihper the parallel tree construction and merging part of the ray-tracing sample. (It looks like the sample is not build for clarity as the first purpose :) )

Coming to this structure:

typedef struct PSAHBuilderContext{
IpprKDTreeBuildAlg Alg;
Ipp32s MaxDepth;
Ipp32f QoS;
Ipp32s AvailMemory;
IppBox3D_32f *Bounds;
}IpprPSAHBldContext;

There is only one line about the member AvailMemory in the doc
"AvailMemory maximum available memory in Mb;"

-Should I query my sytemand pass the really total available memory to this function?
-Should I pass the maximum continuous available memory?
-Is it something like a "hint"? On the two samples, one is initialized to 2048, other is initialized 896...
-What happens when the accelerator hits this memory limit for the given scene? Does it continue to process until more memory is available? Does it crash? Does it return an error? Does it start to swap previous data to disk and continue further using the 896 MB limit?

Thanks in advance...
0 Kudos
3 Replies
Chao_Y_Intel
Moderator
448 Views


Hello,

Some comment from our rendering expert:

There are some known limitations: Other limitations due to insufficient memory may apply depending on rendered model.

During kd-tree construction, additional memory is allocated on demand for internal structures in chunks up to AvailMemory limit. If demand exceeds this limit, ippStsMemAllocErr error is returned. Setting AvailMemory size incorrectly may result in crash as no additional checks for memory allocation errors are performed during actual construction. All internally used memory is freed at the end of the build after kd-tree is packed in the compact format.

Since its impossible to accurately estimate the size of the resulting kd-tree best practice is to specify this parameter to be equal to (total available memory)/2 for a single threaded kd-tree building application. This way you make sure that the resulting tree will fit in available memory together with all data needed for construction. For multithreaded setups you might want to adjust AvailMemory limit to match your model/hardware; constant 896Mb matches typical 4-core setups preventing 4 independent threads to expect more than 4Gb of memory available; if you are using larger number of threads to process complex models you may need to reduce this constant.


Thanks,
Chao

0 Kudos
kaanx
Beginner
448 Views
Loggin from another account (this time registered with parallel studio instead of IPP single package).

Could you please describe what QoS parameter does?
DoesippKDTBuildSimple type have any purpose?
A customer sent me a project where the raytracing is extremely slow. Look at therender time at the title, it's around 12 minutes (on a core i7 940 with 8 threads), comparable similar scenes with similar settings finish within 30 seconds-1 minute max.
I'm experimenting with various tree depths, and QoS parameters, some lower depths with lower Qos parameter provide better performance(around 9 minutes), but the scene runs quite slow in general.
(By profiling I see 90% of the time is spent in IPP's intersection code). There are lots of secondary rays running around for global illumination, and lots of sampling for area lights/soft shadows, therefore it's impossible to construct a small test case. Do you have any recommendations to pinpoint the source of slowness in this scene? (Others scenes run quite fast btw.) Perhaps it's a worst case scenario for kdtree acceleration? Lots of small-long triangles spread around.
0 Kudos
Chao_Y_Intel
Moderator
448 Views


Hello,

Some answers from the funtion owners:

>Does ippKDTBuildSimple type have any purpose?

Simple kd-tree building algorithm always returns a trivial tree with all triangles associated with a single leaf node at the kd-tree root. It was designed for testing and benchmarking purposes.

>Could you please describe what QoS parameter does?

Advanced kd-tree building algorithm (ippKDTBuildPureSAH) is based on recursive subspace subdivision according to surface area heuristic (SAH). Subdivision of a leaf node is made if the best SAH cost after split is less than original SAH cost of this node plus cost of split. The cost of split depends on QoS, the higher QoS, the lower the "cost of split" thus kd-trees with higher QoS tends to be deeper but more efficient in general.

>Perhaps it's a worst case scenario for kdtree acceleration? Lots of small-long triangles spread around.

Small-long triangles should not be a problem for our kd-tree building algorithm because SAH-based optimization takes into the account internal nodes from triangles / split planes intersections. On the other hand, you can try to make Delaunay triangulation to see if its true or not.

Kd-tree is good in helping you to find the closest intersection, but if there is no intersection at all for many rays, you still need to traverse the whole volume without early termination. In any cases, large number of internal holes in the displayed structure makes it different to render using any spatial acceleration structure. Its even worse, because rays are traversed in packets, so even if one of the rays do not hit the surface, the whole packet continues to traverse, decreasing the efficiency of intersectors. It might help to decrease the packet size for such scenes but in general it can make things worse.

>Do you have any recommendations to pinpoint the source of slowness in this scene?
One additional recommendation is to scale the model to fit the unit cube and see if it helps to increase the performance. If your model is too large, several internal heuristics might not work well.

Thanks,
Chao

0 Kudos
Reply