Adding new datasets

In this guide, we assume you already created a dataset and were able to train a model on it. If this is not the case, please follow the Using custom data and Custom dataset loader guides first. Now, we will show you how to publish your dataset so that others can use it as well. In order to do that, you need to publish the dataset first. Then, you can create a download_dataset function, which will download the dataset from the source, and finally, you can add the dataset to the list of available datasets by adding a spec file.

Publishing the dataset

NerfBaselines does not assume any specific structure or format of the dataset, nor does it require the dataset to be uploaded to any specific location. For example, you can host the dataset on your own server, on Google Drive, or any other location. In this tutorial, we assume you have a COLMAP dataset (see Using custom data). Therefore, you have a directory with the following structure:

images/
    0.jpg
    1.jpg
    ...
sparse/
    points3D.bin
    cameras.bin
    images.bin

Now, you can zip the dataset directory and upload it to your own server. We will assume you uploaded the dataset to https://example.com/datasets/my_dataset/my_scene.zip.

Creating the download function

In order to download the dataset, you need to create a Python function that will download the dataset. Here we show one possible implementation of such a function. This function will download the dataset from the specified URL and extract it to the specified output directory. The function will also add a nb-info.json file to the root of the dataset directory. This file contains metadata about the dataset such as the loader to use, the dataset ID, and the scene name. This file is used by NerfBaselines to detect the format of the dataset. Add the following code to a new Python file, e.g., my_dataset.py:

import os
import zipfile
import requests
import tempfile
import shutil

def download_my_dataset(path, output):
    url = "https://example.com/datasets/my_dataset/my_scene.zip"
    # Users may call this function with the following arguments:
    #    path="my-dataset" - in case user wants to download the whole dataset
    #    path="my-dataset/my-scene" - in case user wants to download only a specific scene
    # Anything else is invalid
    assert path == "my-dataset/my-scene" or path == "my-dataset", f"Invalid path: {path}"

    if path == "my-dataset":
        # This is the case of full-dataset download
        output = os.path.join(output, "my-dataset")

    with tempfile.TemporaryDirectory() as tmp:
        output_zip = os.path.join(tmp, "my_scene.zip")
        output_dir = os.path.join(tmp, "my_scene")
        os.makedirs(output_dir, exist_ok=True)

        # Download the zip file
        response = requests.get(url)
        response.raise_for_status()
        with open(output_zip, "wb") as f:
            f.write(response.content)

        # Extract it to the temporary directory
        with zipfile.ZipFile(output_zip, "r") as zip_ref:
            zip_ref.extractall(output_dir)

        # Now, we will add a nb-info.json file to the root of the dataset directory
        with open(os.path.join(output_dir, "nb-info.json"), "w") as f:
            f.write('{"loader": "colmap", "id": "my-dataset", "scene": "my-scene"}')

        # Move the files to the output directory
        os.makedirs(os.path.dirname(output), exist_ok=True)
        if os.path.exists(output):
            shutil.rmtree(output)
        shutil.move(output_dir, output)

Registering the dataset

In order to make the dataset available to NerfBaselines, you need to register the dataset with NerfBaselines. This is done by adding a spec file. The spec file is a Python file that contains a nerfbaselines.register() call. Let’s create a new file, e.g., my_dataset_spec.py, with the following content:

from nerfbaselines import register

register({
    "id": "my-dataset",
    "download_dataset_function": "my_dataset:download_my_dataset",
    "evaluation_protocol": "default",
    "metadata": {
        # ...
    }
})

For testing, you can temporarily notify NerfBaselines about the presence of the spec file by setting the environment variable NERFBASELINES_REGISTER:

export NERFBASELINES_REGISTER="$PWD/my_dataset_spec.py"

Now, you can run the following command to download the dataset:

nerfbaselines download --data external://my-dataset/my-scene

Or use the dataset directly in nerfbaselines commands.

Dataset metadata

The metadata field in the spec file can contain important information about the dataset. This is required for the dataset to appear in the online benchmark. The metadata should contain the following fields:

  • id (str): The unique identifier of the dataset.

  • name (str): The (human readable) name of the dataset.

  • description (str): A short description of the dataset.

  • paper_title (optional str): The title of the paper where the dataset was introduced.

  • paper_authors (optional List[str]): The authors of the paper.

  • paper_link (optional str): The link to the paper.

  • link (optional str): The link to the dataset webpage.

  • metrics (List[str]): The list of metrics that are used to evaluate the dataset, e.g., ["psnr", "ssim", "lpips_vgg"].

  • default_metric (str): The default metric to when sorting the datasets in the public benchmark.

  • scenes (List[Dict[str, str]]): The list of scenes in the dataset. Each scene should contain the following fields:

    • id (str): The unique identifier of the scene.

    • name (str): The (human readable) name of the scene.

Here is an example of the metadata for the Mip-NeRF 360 dataset (with some scenes omitted):

{
    "id": "mipnerf360",
    "name": "Mip-NeRF 360",
    "description": "Mip-NeRF 360 is a collection of four indoor and five outdoor object-centric scenes. The camera trajectory is an orbit around the object with fixed elevation and radius. The test set takes each n-th frame of the trajectory as test views.",
    "paper_title": "Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields",
    "paper_authors": ["Jonathan T. Barron", "Ben Mildenhall", "Dor Verbin", "Pratul P. Srinivasan", "Peter Hedman"],
    "paper_link": "https://arxiv.org/pdf/2111.12077.pdf",
    "link": "https://jonbarron.info/mipnerf360/",
    "metrics": ["psnr", "ssim", "lpips_vgg"],
    "default_metric": "psnr",
    "scenes": [
        { "id": "garden", "name": "garden" },
        { "id": "bicycle", "name": "bicycle" }
    ]
}