Neural Radiance Field!

Catherine Chu

Part 1: Fit a Neural Field to a 2D Image

Before working in 3D, we start with a 2D example: a Neural Field from 2D pixel coordinates (u,v) to 3D pixel colors (r,g,b). This involves a couple of steps:

  1. Network: creating a multilayer perceptron (MLP) with sinusoidal positional encodings
    1. MLP architecture: 3 hidden linear layers of size 256 with ReLU, 1 linear output layer of size 3 with sigmoid
    2. PE: series of sinusoidal functions with highest frequency L=10, mapping 2D coordinates to 42D vectors
  2. Dataloader: implementing a dataloader that randomly samples and processes N pixels per training iteration
  3. Loss Function, Optimizer and Metric: defining metrics and hyperparameters
    1. 25 epochs, batch size 1000, Adam optimizer with learning rate 1e-2
    2. Loss function: Peak signal-to-noise ratio (PSNR) computed from MSE
  4. Hyperparameter Tuning

The following sequence visualizes the training process by plotting the predicted images across iterations.

fox1 fox2 fox3 fox3 fox3
fox_psnr
Fox: PSNR Curve 1

An example from the hyperparameter tuning process was varying L from 10 to 15 and varying channel size from 256 to 128. This edit didn't affect the performance of the network much, as the increase in L seemed to compensate for the decrease in channel size.

foxb1 foxb2 foxb3 foxb3 foxb3
fox_psnr
Fox: PSNR Curve 2

This optimization was also performed for another image. Specifically, the learning rate was increased to 2e-2 because it didn't seem to have converged.

cat1 cat2 cat3 cat4 cat5
cat_psnr
Cat: PSNR Curve 1

This time, I tried the opposite for hyperparameter tuning: decreasing L from 10 to 5 and increasing channel size from 256 to 400. As seen in the predicted outputs, this change seems to create smoother images that capture fewer positional differences.

catb1 catb2 catb3 catb4 catb5
catb_psnr
Cat: PSNR Curve 2

Part 2: Fit a Neural Radiance Field from Multi-view Images

Part 2.1: Create Rays From Cameras

To begin, we define a few functions to transform among image, camera and world coordinates:

Part 2.2: Sampling

Part 2.3: Putting the Dataloading All Together

The above steps are then integrated into the dataloading process, which randomly samples pixels from a dataset of images, and is visualized below:

3d
Plot Cameras, 100 Rays and Samples in 3D

Part 2.4: Neural Radiance Field

Now, the Neural Radiance Field can be learned with a network that takes in 3D world coordinates x and 3D ray direction vector r_d, then outputs predicted 3D rgb colors and a 1D density. This network is a deeper, more powerful MLP.

nerf_arch
NeRF3D Model Architecture

Part 2.5: Volume Rendering

To generate rendered colors, the volume rendering equation aggregates the batch of samples along each ray.

volrend
Volume Rendering Equation

The training process is visualized below, along with the PSNR curve every step (10 iterations) on the validation set.

lego1 lego2 lego3 lego4 lego5
lego_psnr
Lego Scene: Validation PSNR Curve

Finally, the network can be used to render a novel view of the scene from an arbitrary camera extrinsic:

test_vid
Spherical Rendering of Lego Scene

Bells & Whistles: Background Color

A background color can be injected as the bottom of the rays into the volume rendering equation.

testc_vid
Lego Scene with Background Color