VoxFormer: Sparse Voxel Transformer for Camera-based
3D Semantic Scene Completion
2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds
End-to-End Object Detection with Transformers(DETR)
Methods Sections:
VoxFormer: Sparse Voxel Transformer for Camera-based
3D Semantic Scene Completion
BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion
2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds
GFNet: Geometric Flow Network for 3D Point Cloud
Semantic Segmentation
DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception
BEVFormer v2: Adapting Modern Image Backbones to
Bird’s-Eye-View Recognition via Perspective Supervision
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
LENet: Lightweight And Efficient LiDAR Semantic Segmentation Using Multi-Scale Convolution Attention
LidarMultiNet: Towards a Unified Multi-task Network for LiDAR Perception
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
SuperFusion: Multilevel LiDAR-Camera Fusion
for Long-Range HD Map Generation and Prediction
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers