System Prompts
Blog

System Prompts

2025.06.15
ยทServiceยทby Anonymous
#System Prompts#LLM

Key Points

  • 1Data scarcity, encompassing small datasets and rare events, fundamentally challenges the interpretability of machine learning models by compromising their fidelity and the reliability of their explanations.
  • 2This problem manifests through less robust models, unstable post-hoc explanations, amplified biases, and difficulties in generating representative counterfactuals, making interpretations less trustworthy.
  • 3The paper suggests mitigating strategies like data augmentation, transfer learning, and integrating domain knowledge, while emphasizing the need for future XAI methods robust to data limitations and new uncertainty quantification metrics.

This paper presents a comprehensive survey of Neural Radiance Fields (NeRFs) for multi-view 3D reconstruction and novel view synthesis. NeRFs represent a 3D scene as a continuous volumetric function, typically modeled by a Multi-Layer Perceptron (MLP) denoted as Fฮ˜F_\Theta. This MLP takes a 3D coordinate x=(x,y,z)\mathbf{x} = (x, y, z) and a 2D viewing direction d=(ฮธ,ฯ•)\mathbf{d} = (\theta, \phi) as input, and outputs a predicted color c=(r,g,b)\mathbf{c} = (r, g, b) and volume density ฯƒ\sigma.

The core methodology of the original NeRF, which forms the basis for many variants discussed, involves two main components:

  1. Implicit Scene Representation: The scene is encoded by the MLP Fฮ˜F_\Theta. Positionally encoded inputs are often used to enable the MLP to represent high-frequency details. For a 3D point x\mathbf{x}, its position encoding is ฮณ(x)=(sinโก(20ฯ€x),cosโก(20ฯ€x),โ€ฆ,sinโก(2Lโˆ’1ฯ€x),cosโก(2Lโˆ’1ฯ€x))\gamma(\mathbf{x}) = (\sin(2^0\pi\mathbf{x}), \cos(2^0\pi\mathbf{x}), \dots, \sin(2^{L-1}\pi\mathbf{x}), \cos(2^{L-1}\pi\mathbf{x})), where LL is the number of frequencies. Similarly for the viewing direction d\mathbf{d}. The MLP takes ฮณ(x)\gamma(\mathbf{x}) as input to predict ฯƒ\sigma and an intermediate feature vector, which is then concatenated with ฮณ(d)\gamma(\mathbf{d}) to predict c\mathbf{c}.
  2. Volume Rendering: To synthesize the color of a pixel along a ray r(t)=o+tdr(t) = \mathbf{o} + t\mathbf{d} originating from camera origin o\mathbf{o} in direction d\mathbf{d}, volume rendering techniques are employed. The expected color C(r)C(\mathbf{r}) for a ray is computed by numerically integrating color and density values along the ray:
C(r)=โˆซtntfT(t)ฯƒ(r(t))c(r(t),d)dtC(\mathbf{r}) = \int_{t_n}^{t_f} T(t) \sigma(\mathbf{r}(t)) \mathbf{c}(\mathbf{r}(t), \mathbf{d}) dt
where T(t)=expโก(โˆ’โˆซtntฯƒ(r(s))ds)T(t) = \exp(-\int_{t_n}^t \sigma(\mathbf{r}(s)) ds) is the accumulated transmittance along the ray from tnt_n to tt. In practice, this integral is approximated using numerical quadrature, typically stratified sampling and a hierarchical sampling strategy (coarse-to-fine). The ray is divided into NN bins, and a sample is taken uniformly from each bin tiโˆผU[tiโˆ’1,ti]t_i \sim \mathcal{U}[t_{i-1}, t_i]. The discrete rendering formula is:
C^(r)=โˆ‘i=1NTi(1โˆ’expโก(โˆ’ฯƒiฮดi))ci\hat{C}(\mathbf{r}) = \sum_{i=1}^{N} T_i (1 - \exp(-\sigma_i \delta_i)) \mathbf{c}_i
where Ti=expโก(โˆ’โˆ‘j=1iโˆ’1ฯƒjฮดj)T_i = \exp(-\sum_{j=1}^{i-1} \sigma_j \delta_j) and ฮดi=ti+1โˆ’ti\delta_i = t_{i+1} - t_i.
The training objective is to minimize the squared photometric error between rendered and ground-truth pixel colors:
L=โˆ‘rโˆˆRโˆฅC^(r)โˆ’C(r)โˆฅ22\mathcal{L} = \sum_{\mathbf{r} \in \mathcal{R}} \|\hat{C}(\mathbf{r}) - C(\mathbf{r})\|^2_2

The survey categorizes various advancements and extensions of the original NeRF into several themes:

  • Efficiency Enhancement: Addresses slow training and rendering. Methods include explicit data structures (e.g., hash grids in Instant-NGP, voxel grids), knowledge distillation, and optimized sampling strategies.
  • Representation Capacity and Generalization: Aims to improve the ability to model complex scenes and generalize to unseen views. This includes methods using different scene representations (e.g., Gaussian splats, tri-plane representations), conditioning on input images, or meta-learning approaches.
  • Geometric Consistency and Quality: Focuses on improving the fidelity of the reconstructed 3D geometry and visual quality, often by incorporating geometric priors (e.g., depth maps, 3D meshes) or more robust rendering techniques.
  • Dynamic and Deformable Scenes: Extends NeRF to handle scenes with motion, typically by adding a time dimension to the MLP input or by learning explicit deformation fields.
  • Pose Estimation and Illumination: Addresses challenges in scenes with unknown camera poses or varying lighting conditions, often by jointly optimizing pose parameters or learning illumination models.
  • Applications: Highlights diverse applications beyond novel view synthesis, such as 3D reconstruction, augmented reality, virtual reality, robotic navigation, and content creation.

The survey concludes by outlining future research directions, including improving generalization, robustness to pose errors, real-time performance, composability of scenes, and addressing the lack of interpretable semantic information in current NeRF models.