GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction

A. Enes Doruk and Hasan F. Ates

Official implementation of GaussianOcc3D, a novel multi-modal framework that bridges camera semantics and LiDAR geometry through a memory-efficient, continuous 3D Gaussian representation.

For more details, please refer to our paper: GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction.

Abstract

3D semantic occupancy prediction is a pivotal task in autonomous driving, providing a dense and fine-grained understanding of the surrounding environment. While recent single-modality approaches have shown significant progress, they are inherently limited by the physical trade-offs between the semantic richness of cameras and the precise geometric manifolds of LiDAR. Multi-modal frameworks frequently encounter difficulties in managing modality heterogeneity and spatial misalignment, often resulting in the loss of critical semantic or geometric cues. These challenges are further compounded by a representation crisis: complex voxel-based models suffer from excessive computational demands due to processing empty space, while lightweight Bird's-Eye View (BEV) alternatives often rely on lossy compression that fails to accurately represent vertical structures.

To address these limitations, we present GaussianOcc3D, a novel multi-modal framework that bridges camera semantics and LiDAR geometry through a memory-efficient, continuous 3D Gaussian representation where both modalities are projected into a shared Gaussian space. We propose several key innovations:

LiDAR Depth Feature Aggregation (LDFA) module to resolve geometric sparsity.
Entropy-Based Feature Smoothing (EBFS) to mitigate domain-specific noise.
Adaptive Camera-LiDAR Fusion (ACLF) module featuring uncertainty-aware reweighting.
Gauss-Mamba Head leveraging Selective State Space Models to capture global context with linear complexity.

Extensive evaluations on the Occ3D, SurroundOcc, and SemanticKITTI benchmarks demonstrate the efficacy of GaussianOcc3D, achieving state-of-the-art mIoU scores of 49.4%, 28.9%, and 25.2% respectively.

🏗️ Framework Overview

Figure 1: The overall architecture of the proposed GaussianOcc3D framework. The pipeline extracts 2D features via a camera backbone and 3D geometric features via a sparse LiDAR backbone, lifting them into a shared Gaussian space for adaptive fusion and global refinement.

🚀 CODE COMING SOON

The full implementation and pre-trained models are currently being prepared for public release.

Please stay tuned!

📝 Citation

If you find this work useful for your research, please cite our paper:

@article{doruk2026gaussianocc3d,
  title={GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction},
  author={Doruk, A and Ates, Hasan F},
  journal={arXiv preprint arXiv:2601.22729},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
overview.png		overview.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction

A. Enes Doruk and Hasan F. Ates

Abstract

🏗️ Framework Overview

🚀 CODE COMING SOON

The full implementation and pre-trained models are currently being prepared for public release.

Please stay tuned!

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction

A. Enes Doruk and Hasan F. Ates

Abstract

🏗️ Framework Overview

🚀 CODE COMING SOON

The full implementation and pre-trained models are currently being prepared for public release.

Please stay tuned!

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages