Depré, Nicolas
[UCL]
Franck, Arthur
[UCL]
Macq, Benoît
[UCL]
This master’s thesis presents an original approach to trajectory prediction using a transformer-based model that takes images as input. Trajectory forecasting has many applications in fields such as autonomous driving, robotics, and surveillance systems. Up until now, architectures relied mainly on Recurrent Neural Networks (RNNs), and more specifically Long Short-Term Memory models (LSTMs), along with Convolutional Neural Networks (CNNs). This study introduces TrajViViT, a Trajectory Video Vision Transformer. Although transformers have previously found application in trajectory prediction[24], our methodology stands apart by exclusively supplying images as the model’s input. This approach allows to study the vision abilities of the transformer on a trajectory prediction task. To guide the model towards the target it needs to track, a black box is apposed on its position. The task of the model is to detect the box and make a prediction based on its movement. We demonstrate that vision transformer based models have potential for such a task and can beat a Kalman Filter on longer term predictions. Our implementation does not perform as well as state-of-theart models, but still shows interesting results given the fact that no coordinates are provided as input. A PyTorch implementation of the model can be found at https://github.com/arfranck/TrajViViT


Bibliographic reference |
Depré, Nicolas ; Franck, Arthur. TrajViViT: trajectory forecasting with Vvdeo vision transformers on top-view image sequences. Ecole polytechnique de Louvain, Université catholique de Louvain, 2023. Prom. : Macq, Benoît. |
Permanent URL |
http://hdl.handle.net/2078.1/thesis:42047 |