Skip to content
Snippets Groups Projects
Commit 3c03f66d authored by Aditya Prakash's avatar Aditya Prakash
Browse files

Update README.md

parent b50322b0
No related branches found
No related tags found
No related merge requests found
# Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
## [Project Page](https://ap229997.github.io/projects/transfuser/) | [Paper](https://arxiv.org/pdf/2104.09224.pdf) | Supplementary | [Video](https://youtu.be/cc05F56vjVI) | [Poster](https://ap229997.github.io/projects/transfuser/assets/poster.pdf) | Blog
## [Project Page](https://ap229997.github.io/projects/transfuser/) | [Paper](https://arxiv.org/pdf/2104.09224.pdf) | Supplementary | [Video](https://youtu.be/WxadQyQ2gMs) | [Poster](https://ap229997.github.io/projects/transfuser/assets/poster.pdf) | Blog
<img src="transfuser/assets/teaser.png" height="192" hspace=30> <img src="transfuser/assets/full_arch.png" width="400">
<img src="transfuser/assets/teaser.svg" height="192" hspace=30> <img src="transfuser/assets/full_arch.svg" width="400">
This repository contains the code for the CVPR 2021 paper [Multi-Modal Fusion Transformer for End-to-End Autonomous Driving](http://www.cvlibs.net/publications/Prakash2021CVPR.pdf). If you find our code or paper useful, please cite
```bibtex
......
# AIM
<p align="center"> <img src="assets/model.png" width="512"> </p>
<p align="center"> <img src="assets/model.svg" width="512"> </p>
AIM consists of a ResNet34 image encoder with an autoregressive GRU-based waypoint prediction network. This is equivalent to adapting CILRS to predict waypoints conditioned on goal locations rather than predicting vehicle controls conditioned on navigational commmands.
......
# CILRS
<p align="center"> <img src="assets/model.png" width="512"> </p>
<p align="center"> <img src="assets/model.svg" width="512"> </p>
[CILRS](https://arxiv.org/pdf/1904.08980.pdf) is a conditional imitation learning method in which the agent learns to predict vehicle controls from RGB image and measured speed while being conditioned on the navigational command. In addition, the output of the image encoder is also used for predicted the vehicle speed.
......
# Geometric Fusion
<p align="center"> <img src="assets/model.png"> </p>
<p align="center"> <img src="assets/model.svg"> </p>
Geometric Fusion consists of multi-scale image-to-LiDAR and LiDAR-to-image feature projections (inspired by [ContFuse](https://openaccess.thecvf.com/content_ECCV_2018/papers/Ming_Liang_Deep_Continuous_Fusion_ECCV_2018_paper.pdf)). This is equivalent to replacing the transformers in TransFuser with geometry-based feature projections.
......
# Late Fusion
<p align="center"> <img src="assets/model.png" width="600"> </p>
<p align="center"> <img src="assets/model.svg" width="600"> </p>
Late Fusion consists of a 2-stream encoder in which the RGB image and the LiDAR BEV inputs are processed independently of each other. These features are then combined via element-wise summation and passed to waypoint prediction network. This is equivalent to removing the transformer modules from TransFuser.
......
# TransFuser
<p align="center"> <img src="assets/model.png"> </p>
<p align="center"> <img src="assets/model.svg"> </p>
TransFuser uses the self-attention mechanism of the transformers for image and LiDAR feature fusion at multiple resolutions.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment