A new study has used large language models and neural networks to identify and predict the reactivity of over 14,000 potential cementitious materials, offering a scalable approach to discovering low-carbon clinker substitutes.
Study: Data-driven material screening of secondary and natural cementitious precursors. Image Credit: Anggalih Prasetya/Shutterstock.com
Published in Communications Materials, the study demonstrates how neural networks and large language models (LLMs) can be used to systematically map reactivity variations and expand the library of potential cementitious materials. By analyzing the chemical makeup of 14,000 materials extracted from 88,000 academic papers, researchers identified promising secondary and natural cementitious precursors and evaluated their reactivity and pozzolanicity through machine learning.
Background
Reducing the greenhouse gas (GHG) footprint of cement and concrete production is a pressing priority, and clinker substitution is one of the most effective strategies to achieve this. Cementitious precursors—materials that react with water to form cement-like hydration products—can replace a significant portion of clinker, with substitutes like fly ash and slag capable of reducing GHG intensity by up to 50 %.
The cement industry has set an ambitious goal to reduce the global clinker-to-cement mass ratio from 76 % to 52 % by 2050. Realizing this target requires a broader range of substitutes, which in turn demands reliable ways to evaluate the reactivity of diverse and often heterogeneous materials. Current methods, however, rely on time-consuming and resource-heavy experiments.
To address this, the study proposes a data-driven approach powered by machine learning to assess reactivity and pozzolanic potential at scale, dramatically accelerating the screening process for viable materials.
Methods
The research team began by filtering 5.7 million academic papers to isolate around 88,000 focused on cement and concrete. From this corpus, they built two vector databases—one for 3 million sentences and another for 104,000 structured tables—using high-dimensional embeddings to capture semantic meaning and enable efficient, targeted retrieval.
LLM agents, using a retrieval-augmented generation method, were then deployed to extract key data points such as chemical compositions and material names. The models employed included general-purpose transformers like all-mpnet-base-v2 and all-MiniLM-L6-v2, as well as the materials science-specific MatSciBERT.
Data extraction from retrieved tables was handled by fine-tuned versions of GPT-3.5 and Mistral, with GPT-3.5 ultimately chosen based on accuracy against a hand-labeled dataset. Challenges like vague or abbreviated material names were addressed using sentence-level semantic matching and metadata filters.
The machine learning pipeline itself was built in Python 3.8.8 and incorporated widely used libraries including TensorFlow, Scikit-learn, LightGBM, and NumPy. The LLM workflows were supported by PyTorch, Ludwig, LangChain, and SentenceTransformers.
Results and Discussion
The mapped reactivity profiles revealed a diverse range of promising clinker substitutes, including both industrial by-products and naturally occurring materials.
Agricultural wastes such as sugarcane bagasse ash and rice husk ash demonstrated strong pozzolanic behavior, while materials like tree bark ash appeared to function more as hydraulic precursors. Other identified materials included waste glass, municipal solid waste ashes, mine tailings, and construction and demolition debris like recycled ceramics and concrete.
The study also spotlighted 25 types of natural rocks, eight of which showed significant reactivity when mechanically activated. These included pumice, ignimbrites, rhyolite, opaline shales, trachyte, and tuffs—many of which are abundant in rift zones and seismic regions, offering geographically diverse sourcing opportunities.
Additionally, the research found that some crystalline rocks, such as anorthosite, could become reactive through amorphization techniques like vitrification. Clastic rocks rich in silica or volcanic content, often with higher amorphous fractions, also exhibited favorable reactivity.
These findings suggest that the availability of viable clinker substitutes is far broader than previously thought. However, integrating these materials into industrial-scale use will require not only reactivity validation but also supply chain analysis, durability testing, and cost evaluations.
Conclusion and Future Directions
By combining the strengths of LLMs and machine learning, the study introduced a powerful method for rapidly screening and evaluating cementitious materials. A multi-headed neural network model predicted three key reactivity metrics—heat release, bound water, and Ca(OH)2 consumption—based on features like chemical composition, particle size, and phase content.
Among the materials tested, roughly 5 %–25 % of rock samples, such as silicic tuff, pumice, and shale, released more than 200 J/g of heat, suggesting substantial cementitious potential. Many of these materials are globally available in volcanic or tectonic regions, offering a scalable path toward reducing the carbon footprint of cement production.
While the study makes a compelling case for the use of AI-driven material screening, further experimental work is essential to validate predictions and ensure performance consistency. Future research could extend these models to include activation pathways such as vitrification or calcination, creating a more comprehensive framework for optimizing sustainable binder materials.
Journal Reference
Mahjoubi, S., Venugopal, V., Manav, I. B., AzariJafari, H., Kirchain, R. E., & Olivetti, E. A. (2025). Data-driven material screening of secondary and natural cementitious precursors. Communications Materials, 6(1). DOI: 10.1038/s43246-025-00820-4, https://www.nature.com/articles/s43246-025-00820-4
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.