VoxFormer: Sparse Voxel Transformer for Camera-based

3D Semantic Scene Completion

2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

End-to-End Object Detection with Transformers(DETR)

Methods Sections:

VoxFormer: Sparse Voxel Transformer for Camera-based

3D Semantic Scene Completion

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion

2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

GFNet: Geometric Flow Network for 3D Point Cloud

Semantic Segmentation

DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception

BEVFormer v2: Adapting Modern Image Backbones to

Bird’s-Eye-View Recognition via Perspective Supervision

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

LENet: Lightweight And Efficient LiDAR Semantic Segmentation Using Multi-Scale Convolution Attention

LidarMultiNet: Towards a Unified Multi-task Network for LiDAR Perception

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

SuperFusion: Multilevel LiDAR-Camera Fusion

for Long-Range HD Map Generation and Prediction

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers