Enhancing Visual-LLM through Prompt Engineering and Hybrid Retrieval-augmented Generation for Site Safety Compliance Checking

Koi Xiaowen Guo¹, Peter Kok-Yiu Wong¹, Jack C.P. Cheng¹, Xingyu Tao¹, Pak-Him Leung²

¹ The Hong Kong University of Science and Technology, Hong Kong SAR

² AutoSafe Limited, Hong Kong SAR

DOI: 10.35490/EC3.2025.221

Abstract: The increasing prevalence of safety incidents on construction sites states the urgent need for enhanced monitoring. This study proposes an innovative hybrid Retrieval-Augmented Generation (RAG) algorithm to compliance check accuracy for site images. By integrating the Visual Language Model (VLM), we developed an algorithm capable of mastering domain knowledge without fine-tuning and addressing the limitation of interpreting RAG technology with visual information. A three-phased prompting framework was designed to enhance the VLM’s compliance analysis abilities. Experiments based on actual construction site in Hong Kong demonstrated 21.89% increase in retrieval accuracy.

Keywords: Construction Site Safety, Image-based Monitoring, Multimodal Large Language Models, Retrieval-Augmented Generation (RAG)

Download paper

Enhancing Visual-LLM through Prompt Engineering and Hybrid Retrieval-augmented Generation for Site Safety Compliance Checking

Presentation video

Successfully submitted