Introduction

Here is a sample rendered video of a NeRF we created from a video taken ourselves:

What Is NeRF?

Overview on what NeRF is from the nVidia Blog:

NeRFs use neural networks to represent and render realistic 3D scenes based on an input collection of 2D images.

Collecting data to feed a NeRF is a bit like being a red carpet photographer trying to capture a celebrity’s outfit from every angle — the neural network requires a few dozen images taken from multiple positions around the scene, as well as the camera position of each of those shots.

In a scene that includes people or other moving elements, the quicker these shots are captured, the better. If there’s too much motion during the 2D image capture process, the AI-generated 3D scene will be blurry.

From there, a NeRF essentially fills in the blanks, training a small neural network to reconstruct the scene by predicting the color of light radiating in any direction, from any point in 3D space. The technique can even work around occlusions — when objects seen in some images are blocked by obstructions such as pillars in other images.

Accelerating 1,000x With Instant NeRF

While estimating the depth and appearance of an object based on a partial view is a natural skill for humans, it’s a demanding task for AI.

Creating a 3D scene with traditional methods takes hours or longer, depending on the complexity and resolution of the visualization. Bringing AI into the picture speeds things up. Early NeRF models rendered crisp scenes without artifacts in a few minutes, but still took hours to train.

Instant NeRF, however, cuts rendering time by several orders of magnitude. It relies on a technique developed by NVIDIA called multi-resolution hash grid encoding, which is optimized to run efficiently on NVIDIA GPUs. Using a new input encoding method, researchers can achieve high-quality results using a tiny neural network that runs rapidly.

How to run your own NeRF and create a movie

1. Get access to our server with NeRF pre-installed

First start the AWS instance with our NeRF server by subscribing to our AWS Marketplace offering XXXXXX. Then access your Linux NeRF server as follows:

  1. The NeRF server is based on Linux Ubuntu 22. You can access your server according to this guide:
    Connect to your Linux instance. E.g. ssh -i <your-pem-key> ubuntu@<public-dns> sudo passwd ubuntu.
  2. Login to your server and configure a password for user “ubuntu” which we will use to login via DCV Remote Desktop with the following command: sudo passwd ubuntu
  3. Download the NICE DCV client for your OS from https://download.nice-dcv.com/. E.g. for Windows: DCV Windows Client
  4. Open the DCV client and enter the “public DNS name” of your server or “IP address”.

Accept the security warning related to the dynamically created certificate (the connection is secure) clicking “Trust and connect” and login with user ubuntu and the password specified above:

This will open the desktop:

On the desktop you can already find a couple of sample rendered NeRF videos to get an impression of the potential of NeRF. Just double-click a video to watch.

2. Upload the input movie or images

First we need a movie optimally in mpg or mp4 format which we upload to our server:
Click on the upload button to the upper left to open the storage manager and click “Upload File”:

The uploaded file will be stored on the Desktop – in this case called “car.mp4”:

3. Run Feature extraction and start the GUI

First the movie will be split into different images depending on the frequency setting on when images should be taken from the movie (default is every 2 seconds). Then the colab software will “put the images into space” computing where the respective image has been taken based on feature extraction from the images and finally the GUI with the training of the NeRF NN will be started. In the listing below output is shown after “>”.

cd ~/instant-ngp
# ./split-movie-start-gui.sh car                            # in our case
./split-movie-start-gui.sh YOUR_FILE_NAME  # replace YOUR_FILE_NAME with the movie file without the extension
> running ffmpeg with input video file="car.mp4", output image folder="images", fps=2.0.
> ==== running: mkdir "images"
> ==== running: ffmpeg -i "car.mp4" -qscale:v 1 -qmin 1 -vf "fps=2.0" "images"/%04d.jpg
> ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
> .....................
> ./images/0033.jpg sharpness= 121.49772272747072
> ./images/0032.jpg sharpness= 161.2213300179093
> ./images/0031.jpg sharpness= 91.81770386535686
> up vector was [-0.90249607 -0.06381886  0.42594366]
> computing center of attention...
> [-0.20876068 -0.21951382 -1.88465108]
> avg camera distance from origin 4.119773885921966
> 59 frames
> writing transforms.json
> real	4m31.060s
> user	10m52.576s
> sys	0m33.862s
# Starting GUI
> 16:09:13 INFO     Loading NeRF dataset from
> 16:09:13 INFO       data/car/transforms.json
> 16:09:14 SUCCESS  Loaded 59 images after 0s
> 16:09:14 INFO       cam_aabb=[min=[-0.543353,-0.744771,1.02994], max=[1.59767,1.98502,1.18296]]
> 16:09:14 INFO     Loading network config from: configs/nerf/base.json
> 16:09:14 INFO     GridEncoding:  Nmin=16 b=1.66248 F=2 T=2^19 L=16
> 16:09:14 INFO     Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
> 16:09:14 INFO     Color model:   3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
> 16:09:14 INFO       total_encoding_params=13623184 total_network_params=10240

After about 4 minutes in our case the GUI starts up automatically and starts to train the NeRF neural network on the images acquired from the movie. The screenshot below show the – in this case quite blurry – rendered image from the neural network and the training information in the panel to the right.

With the “t” key you can stop the training. To the lower area of the training panel you can find the camera control and snapshot saving:

Sometimes it is helpful to alter the camera control into “First person” mode.
Saving the snapshot of the neural network is triggered by clicking on the button “Save”.

4. Creating the camera path and movie

To create our new movie moving through the 3D scene in our neural network we can use the upper left “Camera path” interface. Please click on the Camera bar to open it (you can also click on the other “instant-ngp” bar to shrink it to have more space on the screen):

In the Camera interface click on “Add from cam” to store a new camera position. Move the scene with the mouse to the new location for your next camera position and click “Add from cam” again. Repeat this step until you have enough camera positions.

You can also zoom out to see your path and different camera positions. The camera interface in addition offers moving between different camera positions, to delete camera positions and other options to control the camera.

Finally click “Save” to save a json file with the different camera positions. Close the GUI by clicking the upper right “x” to free up GPU memory.

5. Render the movie of the 3D scene stored in the neural network

# cd ~/instant-ngp
# ./create-movie.sh data/car
./create-movie.sh data/YOUR_FILE_NAME      # replace YOUR_FILE_NAME with the directory created 
                                                                                    # in the previous step based on your movie name
# the script above has the second optional parameter "number of seconds" controlling the length of the movie generated. Default is 15 seconds
# ./create-movie.sh data/car 5     # creates a movie with 5 seconds length 

Example output of the movie generation process. Depending on the performance of the GPU, length of the movie and complexity of the scene the movie generation can take between a few minutes up to hours. The expected rendering time will be displayed:

After rendering has finished the movie with “-vid” added to the name will be copied on the Desktop. Just double-click on the movie desktop icon – in our case “car-vid.mp4”.

We have put a couple of sample videos rendered by NeRF on the desktop like this:

More Background on NeRF