N-Dimensional Gaussians for Fitting of High Dimensional Functions

sdiolatz · ‎06-04-2024

The need for an N-Dimensional explicit method

The flexibility and robustness of neural networks to overfit to complex and high dimensional inputs has resulted in big leaps quality wise for many different applications within the realm of neural rendering and beyond. Many of the works that used big Multilayer Perceptrons (MLPs), like the original NeRF paper, struggled with performance and required tens of hours for training. This quickly became a target for follow-up research with works like Direct Voxel Grid Optimization, Instant NGP and, more recently, 3D Gaussian Splatting improving significantly on both fronts. These methods cut down training from hours to minutes and enable high FPS inference.

One issue we came across during this research project is that these methods get this valuable performance gain with a cost in robustness and flexibility compared to the previous implicit methods that used deep MLPs. All of these methods have a bias towards the 3D world space in terms of representation power. And it makes sense, this was also a design choice in the original NeRF to make sure that the network cannot cheat in novel view synthesis scenarios by creating facades (nerf ++ ref). But as we moved to more explicit methods this changed from an optional design choice to a integral part of these methods and, to an extent, a limitation.

In scenarios where inputs to a function include many different dimensions that are integral to the final output of the function apart from just the input 3D world position xyz the options of current methods were limited. Such scenarios include neural global illumination where the inputs can be: albedo, roughness and viewing direction or even just real world scenes where the viewing direction is as important as the world position for effects like complex reflection patterns. In these type of scenarios the best option is a hybrid method that still has access to an MLP, like Instant-NGP, but due to its limited size the quality doesn't reach the levels of implicit methods.

N-Dimensional Gaussians for Fitting of High Dimensional Functions

The flexibility of having these extra input dimensions, passing them to a method and letting the optimization decide when and where to utilize them while keeping the fast training times we are now used to, is what we want to recover in our recent SIGGRAPH 2024 paper "N-Dimensional Gaussians for Fitting of High Dimensional Functions".

To achieve this we take advantage of the useful properties of Gaussians being easily extendable to higher dimensions while they can be projected to lower dimensions and still retain their Gaussian properties. Another advantage of using a Gaussian mixture in these higher dimensions is that we avoid modelling explicitly empty space which is useful when these spaces become sparser and sparser as dimensions increase.

Despite these advantages there are quite a few challenges in this endeavor. Apar from the more practical issues, like computing the analytical gradients for the optimization for any dimensionality, many design choices must be made to ensure fast training and inference times:

How should this mixture be refined during training? Merge and split thresholds quickly become unmanageable in higher dimensions. We instead propose an alternative refinement process controlled by the optimizer.
Can we cull Gaussians efficiently now that they live in higher dimensions? That is a question highly associated with nearest or approximate-nearest neighbor search! Thankfully a lot of research has already gone into finding efficient algorithms in such cases, like Locality Sensitive Hashing. We are inspired by such methods and use random vectors and projection to do our culling!

All of these challenges are explored in our paper while applying our method on two different applications (10D+ and 6D w/ projection to 3D) to show its robustness and potential applications. Join us in Denver in July for our presentation and demos.

References:

[1] Mildenhall, Ben, et al. "Nerf: Representing scenes as neural radiance fields for view synthesis." _Communications of the ACM_ 65.1 (2021): 99-106.

[2] Sun, Cheng, Min Sun, and Hwann-Tzong Chen. "Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction." _Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition_. 2022.

[3] Müller, Thomas, et al. "Instant neural graphics primitives with a multiresolution hash encoding." _ACM transactions on graphics (TOG)_ 41.4 (2022): 1-15.

[4] Kerbl, Bernhard, et al. "3d gaussian splatting for real-time radiance field rendering." _ACM Transactions on Graphics_ 42.4 (2023): 1-14.