Optimizing Transformer Models for Edge Deployment in Autonomous Vehicles: Lightweight Architectures and Quantization Strategies for Embedded Vision

Autonomous Vehicles, Edge Computing, Transformer Networks, Lightweight Architectures, Vision Transformers, Quantization, Embedded Systems, Real-Time Inference, Computer Vision, Model Compression

Authors

  • Xin NIE School of Computer Science and Engineering, Wuhan Institute of Technology. Wuhan, Hubei, China., China
Volume 2025
Research Articles
June 24, 2025

Downloads

The deployment of autonomous vehicles (AVs) in real-world environments demands fast, accurate, and energy-efficient perception systems capable of operating under stringent computational and power constraints. Recent advances in transformer-based architectures have led to significant breakthroughs in computer vision, achieving state-of-the-art performance in object detection, semantic segmentation, and scene understanding. However, their large model size, high memory consumption, and latency present considerable obstacles for real-time deployment in edge computing environments typically found in AV platforms.

This research investigates the optimization of vision transformer (ViT) models for edge deployment in autonomous driving, with a specific focus on lightweight architectures and quantization strategies. We examine the architectural and computational trade-offs associated with deploying standard ViT variants on embedded devices and propose a set of model compression techniques to mitigate performance bottlenecks. These include quantization-aware training (QAT), post-training quantization (PTQ), structured pruning, and knowledge distillation. To benchmark performance, we selected representative lightweight transformer models—MobileViT, TinyViT, and EfficientFormer—and conducted extensive evaluations using autonomous driving datasets such as KITTI, Cityscapes, and BDD100K. We implemented deployment across industry-relevant edge platforms, including the NVIDIA Jetson Xavier NX, Raspberry Pi 4, and Google Coral Edge TPU, and evaluated performance based on multiple criteria: inference latency, throughput (FPS), model size (in MB), memory usage, accuracy (mean Average Precision and Intersection-over-Union), and power consumption (Watts). The experimental results indicate that quantized transformer models can achieve substantial improvements in computational efficiency without significant loss in accuracy. For instance, MobileViT with QAT reduced the model size from 52 MB to 29 MB while maintaining over 90% of the original detection accuracy, and inference speed improved by up to 37% on Jetson Xavier NX.

Moreover, this study reveals that hybrid optimization—combining quantization with pruning and distillation—offers superior performance-to-efficiency trade-offs, outperforming traditional CNN-based lightweight models (e.g., MobileNet, YOLO-Nano) in AV perception tasks. The proposed models demonstrate practical feasibility for real-time autonomous navigation and lay the groundwork for future transformer deployment in safety-critical, resource-constrained embedded systems.

This paper provides a comprehensive benchmarking framework and a set of best practices for deploying transformer-based vision models in autonomous vehicles, addressing the pressing need for edge-optimized artificial intelligence in next-generation transportation systems. By bridging the gap between high-performance vision algorithms and hardware-efficient deployment, this research contributes to the realization of more intelligent, responsive, and scalable AV systems.