Enhancing Visual-LLM through Prompt Engineering and Hybrid Retrieval-augmented Generation for Site Safety Compliance Checking

Koi Xiaowen Guo1, Peter Kok-Yiu Wong1, Jack C.P. Cheng1, Xingyu Tao1, Pak-Him Leung2
1 The Hong Kong University of Science and Technology, Hong Kong SAR
2 AutoSafe Limited, Hong Kong SAR
DOI: 10.35490/EC3.2025.221
Abstract: The increasing prevalence of safety incidents on construction sites states the urgent need for enhanced monitoring. This study proposes an innovative hybrid Retrieval-Augmented Generation (RAG) algorithm to compliance check accuracy for site images. By integrating the Visual Language Model (VLM), we developed an algorithm capable of mastering domain knowledge without fine-tuning and addressing the limitation of interpreting RAG technology with visual information. A three-phased prompting framework was designed to enhance the VLM’s compliance analysis abilities. Experiments based on actual construction site in Hong Kong demonstrated 21.89% increase in retrieval accuracy.
Keywords: Construction Site Safety, Image-based Monitoring, Multimodal Large Language Models, Retrieval-Augmented Generation (RAG)

Presentation video

Successfully submitted

Your submission has been received. We will review your details and contact you soon.