MINER: Multiscale Implicit Neural Representation

ECCV 2022

Vishwanath Saragadam, Jasper Tan, Guha Balakrishnan,
Richard G. Baraniuk, Ashok Veeraraghavan

We present a novel implicit representation framework called MINER that is well suited for tasks such as fitting very high resolution point clouds ovre multiple levels of detail (LoD). This figure demonstrates fitting of the Lucy 3D mesh over five spatial scales. MINER takes less than half an hour to achieve an intersection over union (IoU) of 0.999 when trained over more than a billion 3D points.

Abstract

We introduce a new neural signal representation designed for the efficient high-resolution representation of large-scale signals. The key innovation in our multiscale implicit neural representation (MINER) is an internal representation via a Laplacian pyramid, which provides a sparse multiscale representation of the signal that captures orthogonal parts of the signal across scales. We leverage the advantages of the Laplacian pyramid by representing small disjoint patches of the pyramid at each scale with a tiny MLP. This enables the capacity of the network to adaptively increase from coarse to fine scales, and only represent parts of the signal with strong signal energy. The parameters of each MLP are optimized from coarse-to-fine scale which results in faster approximations at coarser scales, thereby ultimately an extremely fast training process. We apply MINER to a range of large-scale signal representation tasks, including gigapixel images and very large point clouds, and demonstrate that it requires fewer than 25% of the parameters, 33% of the memory footprint, and 10% of the computation time of competing techniques such as ACORN to reach the same representation error.

Image fitting

MINER achieves 40dB for a large image (16MP) at 16x lower resolution in 2s, 8x in 3s, 4x in 9s, 2x in 24s and at full resolution in 50s. In comparison, ACORN achieves 30.8dB in 50s, and KiloNeRF achieves 32.1dB in 50s.

Gigapixel image fitting

MINER fits to gigapixel images in 3 hours or less, making it well-suited for extremely large images.

Fitting large 3D point clouds

MINER enables training on very large point clouds (1 billion points) in a progressive manner with visually pleasing results in as few as 6 minutes with far fewer parameters (20x) compared to approaches such as ACORN.

Cite


            @inproceedings{saragadam2022miner,
              title={MINER: Multiscale Implicit Neural Representations},
              author={Vishwanath Saragadam and Jasper Tan and Guha Balakrishnan and Richard Baraniuk and Ashok Veeraraghavan},
              booktitle={European Conf. Computer Vision},
              year={2022},
              url_Paper={https://arxiv.org/pdf/2202.03532.pdf}
            }