IMPROVING SMALL FIRE TARGET DETECTION IN UAV IMAGERY: AN ENHANCED RT-DETR WITH MULTI-SCALE FUSION AND EXPERT ROUTING-Upubscience Publisher

IMPROVING SMALL FIRE TARGET DETECTION IN UAV IMAGERY: AN ENHANCED RT-DETR WITH MULTI-SCALE FUSION AND EXPERT ROUTING

Download as PDF

Volume 3, Issue 2, Pp 63-74, 2025

DOI: https://doi.org/10.61784/wjer3031

Author(s)

ZhiCheng Zhang

Affiliation(s)

Queen Mary School Hainan, Beijing University of Posts and Telecommunications, Beijing 100876, China.

Corresponding Author

ZhiCheng Zhang

ABSTRACT

Early fire detection is of paramount importance for forest fire prevention, yet traditional monitoring methods (e.g., satellites and ground-based stations) suffer from poor real-time performance or limited coverage. Unmanned aerial vehicles equipped with computer vision offer a novel solution for fire detection, but complex backgrounds, small flame and smoke targets, and varying illumination and weather conditions make accurate recognition challenging. In this work, we enhance the real-time detection Transformer model RT-DETR by designing a hybrid encoder architecture tailored for UAV fire imagery. Key improvements include the integration of an Adaptive Spatial Feature Fusion (ASFF) module to reconcile multi-scale feature inconsistencies; incorporation of Efficient Channel Attention (ECA) to strengthen channel-wise representations; replacement of the Transformer's fully connected feed-forward network with a Gated Mixture-of-Experts (MoE) structure to boost model capacity; and a multi-layer Transformer feature aggregation strategy. We evaluate the improved model on a UAV smoke fire dataset. Results show a significant uplift in both detection accuracy and recall: at an IoU threshold of 0.5, the enhanced RT-DETR achieves over 88.8% mAP—an approximate 2% gain over the original RT-DETR and superior performance compared to YOLO-series baselines. Ablation studies confirm that ASFF fusion, multi-attention mechanisms, and the MoE architecture each contribute meaningfully to small-target fire detection. Crucially, these advances incur negligible additional inference latency, enabling real-time intelligent monitoring for wildland fire scenarios.

KEYWORDS

Fire detection; Real-time object detection; RT-DETR; Adaptive Spatial Feature Fusion (ASFF); Mixture-of-experts (MoE)

CITE THIS PAPER

ZhiCheng Zhang. Improving small fire target detection in UAV Imagery: An enhanced RT-DETR with multi-scale fusion and expert routing. World Journal of Engineering Research. 2025, 3(2): 63-74. DOI: https://doi.org/10.61784/wjer3031.

REFERENCES

[1] Chen Y, Zhang Y, Xin J, et al. A UAV-based forest fire detection algorithm using convolutional neural network. 2018 37th Chinese Control Conference (CCC). IEEE, 2018: 10305-10310.

[2] Haucap J, Rasch A, Stiebale J. How mergers affect innovation: theory and evidence. International Journal of Industrial Organization, 2019, 63: 283-325.

[3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. In ′ Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, 2: 2980–2988.

[4] Jocher G, Stoken A, Borovec J, et al. ultralytics/yolov5: v3. 0. Zenodo, 2020.

[5] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-ofthe-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464–7475.

[6] Mukhiddinov M, Abdusalomov A B, Cho J. A wildfire smoke detection system using unmanned aerial vehicle images based on the optimized YOLOv5. Sensors, 2022, 22(23): 9384.

[7] Zhao Y, Lv W, Xu S, et al. Detrs beat yolos on real-time object detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 16965-16974.

[8] Xizhou Zhu, Weijie Su, Lewei Lu, et al. Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2020.

[9] Shilong Liu, Feng Li, Hao Zhang, et al. Dab-detr: Dynamic anchor boxes are better queries for detr. In International Conference on Learning Representations, 2021.

[10] Lv W, Zhao Y, Chang Q, et al. Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer. arXiv preprint arXiv:2407.17140, 2024.

[11] Liu Z, Zhang K, Wang C, et al. Research on the identification method for the forest fire based on deep learning. Optik, 2020, 223: 165491.

[12] Jiaqi Shi, Jinhu Wang, Junhui Xu, et al. Research on forest fire monitoring technology based on UAV and convolutional neural network. Advances in Applied Mathematics, 2022, 11: 3200.

[13] Jie Li, Xuanbing Qiu, Enhua Zhang, et al. Fire recognition algorithm based on convolutional neural network. Journal of Computer Applications, 2020, 40(S2): 173-177.

[14] Qiang Chen, Jian Wang, Chuchu Han, et al. Group detr v2: Strong object detector with encoder-decoder pretraining. arXiv preprint arXiv:2211.03594, 2022.

[15] Shazeer N, Mirhoseini A, Maziarz K, et al. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.

[16] Fedus W, Zoph B, Shazeer N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 2022, 23(120): 1-39.

[17] Riquelme C, Puigcerver J, Mustafa B, et al. Scaling vision with sparse mixture of experts. Advances in Neural Information Processing Systems, 2021, 34: 8583-8595.

[18] Yuan, Jinghuil. A Margin-Maximizing Fine-Grained Ensemble Method. arXiv preprint arXiv:2409.12849, 2024.