Skip to content
Snippets Groups Projects

TransFusion repository

This documentation introduces detailed steps to perform training and testing for TransFusion model built on MMDetection3D framework.

This is a PyTorch implementation of TransFusion for CVPR'2022 paper "TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers", by Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu and Chiew-Lan Tai.

This paper focus on LiDAR-camera fusion for 3D object detection. If you find this project useful, please cite:

@article{bai2021pointdsc,
  title={{TransFusion}: {R}obust {L}iDAR-{C}amera {F}usion for {3}D {O}bject {D}etection with {T}ransformers},
  author={Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu and Chiew-Lan Tai},
  journal={CVPR},
  year={2022}
}

Introduction

LiDAR and camera are two important sensors for 3D object detection in autonomous driving. Despite the increasing popularity of sensor fusion in this field, the robustness against inferior image conditions, e.g., bad illumination and sensor misalignment, is under-explored. Existing fusion methods are easily affected by such conditions, mainly due to a hard association of LiDAR points and image pixels, established by calibration matrices.

We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions. Specifically, our TransFusion consists of convolutional backbones and a detection head based on a transformer decoder. The first layer of the decoder predicts initial bounding boxes from a LiDAR point cloud using a sparse set of object queries, and its second decoder layer adaptively fuses the object queries with useful image features, leveraging both spatial and contextual relationships. The attention mechanism of the transformer enables our model to adaptively determine where and what information should be taken from the image, leading to a robust and effective fusion strategy. We additionally design an image-guided query initialization strategy to deal with objects that are difficult to detect in point clouds. TransFusion achieves state-of-the-art performance on large-scale datasets. We provide extensive experiments to demonstrate its robustness against degenerated image quality and calibration errors. We also extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking, showing its effectiveness and generalization capability.

pipeline

TransFusion Usage

Installation

Please use the Dockerfile under docker folder to install prerequisites and TransFusion repository. We use mmdet 2.11.0 and mmcv 1.3.0. We do not suggest modify the version inside the Dockerfile since it would cause incompatibility issues. To build docker image, simply run:

docker build -t transfusion docker/

This would create a image with name of "transfusion".

Mount Volume

Use the following command to create a volume that mount the data directory which stores the dataset and saves the checkpoints:

docker volume create --driver local \
--opt type=nfs \
--opt o=addr=seal.engin.umich.edu \
--opt device=:/mnt/pool1/workspace \
transfusion_workspace

This is necessary for the TransFusion inside docker to have access to the ouside and preserve the generated data.

Run Docker

To run the created image, please use:

docker run --gpus all --shm-size=8g -it -v /mnt/workspace/users/heming/TransFusion/data:/TransFusion/data -v transfusion_workspace:/mnt/workspace transfusion

in the command line.

Install spconv

Use the following command to clone the spconv repo with version 1.2.1

git clone https://github.com/traveller59/spconv.git --recursive -b v1.2.1
cd spconv
python setup.py bdist_wheel

Change directory to spconv/dist, use pip to install the generated file there to finish the installation of spconv.

Training & Testing of First Stage

Hyperparameters can be found and modified in this config. The most important ones are samples_per_gpu, lr. Remember to check the data directory data_root(where the dataset is stored) and work_dir(where the checkpoints are stored) before running.

For single GPU training, run:

python tools/train.py configs/transfusion_nusc_voxel_L.py

For multiple GPUs training, run:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=-29502 ./tools/train.py configs/transfusion_nusc_voxel_L.py --launcher pytorch

where 4 can be changed to any numbers of GPUs that you want to use in parallel. For testing, just change tools/train.py to tools/test.py and specify the checkpoints to load from.

Training & Testing of Second Stage

Hyperparameters can be found and modified in this config. The most important ones are samples_per_gpu, lr. Remember to check the data directory data_root(where the dataset is stored) and work_dir(where the checkpoints are stored) before running.

In second stage, we have to combine model checkpoints for pre-trained ResNet50 as our 2D backbone which can be downloaded here, and the model we trained from first stage. Run the script to get the combined model and load it use load_from in the config. Remember to change the file name accordingly in the model combint script.

For single GPU training, run:

python tools/train.py configs/transfusion_nusc_voxel_LC.py

For multiple GPUs training, run:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=-29502 ./tools/train.py configs/transfusion_nusc_voxel_LC.py --launcher pytorch

where 4 can be changed to any numbers of GPUs that you want to use in parallel. For testing, just change tools/train.py to tools/test.py and specify the checkpoints to load from.

Inference & Visualization

For testing, run:

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval mAP]

Result file directory is optional, but to visualize the inference result, have to have the RESULT_FILE in pickle format. Then, to visualize, run the following command:

python tools/misc/visualize_results.py ${CONFIG_FILE} --result ${RESULTS_PATH} --show-dir ${SHOW_DIR}

Point cloud and bounding boxes information will then be saved under SHOW_DIR. Each folder unde SHOW_DIR corresponds to one timestamp and each contains three files: 1 obj file that stores the point cloud in this frame and 2 ply files that store bounding boxes of ground truth and prediction seperately. These files can be imported and visualized in either open3d or MeshLab.

CUDA Out of Memory Issue

If ran into CUDA out of memory problem, please first use nvidia-smi to check whether all the GPUs you are using have enough memory storage. If not, you can try to use less GPUs by specifying the IDs of the GPUs that have enough memory usage, you can do this like:

export CUDA_VISIBLE_DEVICES=0,1,2,3

If can't fit in the GPU(some GPUs only has 8-16 GB memory which cannot support too big batch size), you can decrease the batch size per GPU by modifying the samples_per_gpu parameter in the config file. Remember, when you adjust the batch size, you should change the learning rate accordingly.

Acknowlegement

We sincerely thank the authors of mmdetection3d, CenterPoint, GroupFree3D for open sourcing their methods.