LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians

1Hong Kong University of Science and Technology,
2International Digital Economy Academy (IDEA),
3The Chinese University of Hong Kong (Shenzhen)
arXiv:2404.16323

Corresponding Author

Open-category 3D object reconstruction from a single-view image


Abstract

Rencently, Gaussian splatting has demonstrated significant success in novel view synthesis. Current methods often regress Gaussians with pixel or point cloud correspondence, linking each Gaussian with a pixel or a 3D point. This leads to the redundancy of Gaussians being used to overfit the correspondence rather than the objects represented by the 3D Gaussians themselves, consequently wasting resources and lacking accurate geometries or textures. In this paper, we introduce LeanGaussian, a novel approach that treats each query in deformable Transformer as one 3D Gaussian ellipsoid, breaking the pixel or point cloud correspondence constraints. We leverage deformable decoder to iteratively refine the Gaussians layer-by-layer with the image features as keys and values. Notably, the center of each 3D Gaussian is defined as 3D reference points, which are then projected onto the image for deformable attention in 2D space. On both the ShapeNet SRN dataset (category level) and the Google Scanned Objects dataset (open-category level, trained with the Objaverse dataset), our approach, outperforms prior methods by approximately 6.1\%, achieving a PSNR of 25.44 and 22.36, respectively. Additionally, our method achieves a 3D reconstruction speed of 7.2 FPS and rendering speed 500 FPS. The code will be released at https://github.com/jwubz123/DIG3D.


Method

MY ALT TEXT

(a) Overview of DIG3D. -->: steps not utilized in inference. (b) Detailed structure for feature fusion in the encoder. (c) Detailed structure for one decoder layer. Queries are updated at each layer and serve as input for the next layer, while the reference points are updated based on the new centers of the Gaussians and projected onto the image feature plane. DFA: deformable cross attention layer; FFN: Feed Forward Network; ⊕: updation of 3D Gaussian.


Results

Category-specific dataset: ShapeNet SRN Chairs


Category-specific dataset: ShapeNet SRN Cars


Open-category dataset: GSO


Comparisons


Speed

MY ALT TEXT

Inference time comparison on ShapeNet SRN. 3D: 3D Reconstruction; R: rendering. Inference: from a image to 250 novel views. Unit in second.


Analysis


MY ALT TEXT

Centres of 3D Gaussians (Point cloud)



BibTeX

@article{wu2024dig3d,
        title={LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians},
        author={Wu, Jiamin and Liu, Kenkun and Gao, Han and Jiang, Xiaoke and Zhang, Lei},
        journal={arXiv preprint arXiv:2404.16323},
        year={2024}
      }