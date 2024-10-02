A recent article published in Buildings leveraged natural language processing (NLP) and knowledge graph (KG) modeling techniques to recommend a framework for construction safety guidelines. Unstructured safety texts were transformed into a structured, interconnected KG and then ranked according to relevance using PageRank and Louvain Clustering algorithms.

Background

The construction industry is inherently massive and labor-intensive. Though it significantly contributes to the economy, it is characterized by high risks due to frequent and severe accidents. Countries like the United States, Australia, and the United Kingdom mandate advanced construction safety practices.

Accordingly, government agencies in these nations develop and publish standardized safety guidelines to enhance safety across the industry. However, significant challenges remain due to their fragmented and unstructured nature, typically as lengthy PDF (portable document format) documents. This complicates the practical applicability of the guidelines by the professionals in their daily work.

More streamlined and accessible methods are required to disseminate and utilize the established safety guidelines, ensuring enhanced safety in the construction environment. Thus, this study proposed integrating fragmented safety management guidelines into unified and structured KGs using NLP and KG modeling methods.

Methods

The proposed KG modeling and recommendation framework involved three primary steps: preprocessing construction safety guidelines, creating KG models, and applying ranking and clustering algorithms to find the most critical and relevant safety guideline items.

Relevant text data for modeling were extracted from 86 construction safety guideline documents obtained from the Korea Occupational Safety and Health Agency (KOHSA). These PDF guidelines were classified by the work type, such as building demolition, bridge construction, etc. Subsequently, they were transformed into CSV (comma-separated values) format with 5988 rows containing category, title, and content columns, split by statements.

Soynlp, a Korean NLP library, was utilized to tokenize the content columns and divide them into L-tokens (nouns) and R-tokens (particles, conjunctions, complements, etc.). Among these, the L-tokens were chosen for constructing the KG as they comprised the main concepts and technical expressions from the guidelines. Subsequently, the TF-IDF (term frequency-inverse document frequency) weight function was employed to extract each token’s index within the sentences.

Neo4j, a popular graph data science and analytics platform, was exploited to construct the KG and apply algorithms with the Cypher language. Consequently, the KG for the KOSHA Safety Guidelines was built by connecting all items based on the hierarchy of the documents and shared keywords. Finally, the PageRank and Louvain Clustering algorithms were applied to the KG to identify the most relevant guideline items.

Results and Discussion

The graph generation procedure resulted in a total of 669 category, 5988 content, and 102923 index nodes. Preprocessing helped establish connections (called Relate edges) among content nodes, forming a web of relationships between the information entities. The number of shared indexes determined the weight of each Relate edge.

Post the preprocessing and graph construction, the network comprised 217220 Include edges, reflecting associations or inclusions between different elements, and 13301903 Relate edges, representing the interconnectedness between content nodes based on shared indices.

The PageRank algorithm was applied to the projected graph extracted from the KOSHA guidelines using the ‘scaffolding’ keyword. While the PageRank algorithm analyzed each content item’s significance and connectivity through shared keyword counts, the Louvain algorithm determined clusters with high modularity, denoting strong similarities.

Thus, among the 26 content items extracted from the graph, 13 were selected through PageRank values and Louvain algorithm results. High PageRank values covered prominent safety topics such as falls, drops, and scaffolding, which are critical in several safety contexts. Moreover, content within the same cluster generally shared themes related to fall, scaffolding, and structure, indicating thematic coherence.

Notable, the items with high PageRank values focused on important aspects of worksite safety, such as preventing tripping hazards around scaffolding, minimizing wind pressure risks, and ensuring adherence to safety approaches during scaffold ascent. Thus, users could quickly determine the most crucial safety measures linked to several other guidelines using PageRank, enhancing focus on critical safety topics.

Louvain algorithm systematically organized safety content by grouping related guidelines such as fall protection and scaffolding safety, making navigation through similar topics easy for users. This addressed critical safety measures comprehensively within their respective clusters, improving the overall safety management process.

Conclusion

Overall, the researchers successfully developed a KG-based method using NLP to organize and systematize construction safety guidelines. Additionally, the PageRank and Louvain Clustering algorithms were applied to efficiently extract essential information from the graph database, enabling the retrieval of safety-related information relevant to construction trades.

Despite the valuable insights from the proposed recommendation system, more rigorous field testing is required to assess its real-world applicability. Thus, the researchers suggest further enhancing it to account for contextual factors such as the type and conditions of construction tasks.

Journal Reference

Lee, J., & Ahn, S. (2024). PageRank Algorithm-Based Recommendation System for Construction Safety Guidelines. Buildings, 14(10), 3041. DOI: 10.3390/buildings14103041, https://www.mdpi.com/2075-5309/14/10/3041

