*Important notice: This news reports on an unedited version of an accepted paper and is awaiting final editing. Therefore, the paper should not be regarded as conclusive or treated as established information.
Researchers have developed advanced supervised machine learning models to accurately predict the compressive strength of sustainable concrete incorporating industrial waste. The research was published in Scientific Reports.

Study: Optimizing the mechanical performance of sustainable industrial waste modified concrete using supervised machine learning modeling and feature importance analysis. Image Credit: Andrew Angelov/Shutterstock.com
Machine Learning Performance Overview
Concrete is a key construction material worldwide, but its main component, Portland cement, is energy-intensive and a major source of CO2 emissions. To promote sustainability, industrial waste materials such as silica fume, fly ash, GGBFS, and various powders are increasingly used as supplementary cementitious materials or partial aggregates, thereby reducing environmental impact and waste disposal issues.
However, concrete modified with such wastes has a complex, heterogeneous composition influenced by factors like water content and curing age. Traditional experimental testing for compressive strength is costly and slow, while empirical formulas often fail to capture nonlinear mix relationships. Supervised machine learning offers a rapid, accurate alternative but is hindered by limited data and model interpretability in prior studies.
Addressing these gaps, this study applies multiple supervised ML models to a large dataset of 711 industrial waste concrete samples, integrating SHapley Additive exPlanations (SHAP) for interpretable insights, thereby optimizing compressive strength predictions to support sustainable construction material design.
Want to save this for later? Click here.
Feature Importance and Insights
The research employed a systematic data-driven methodology beginning with the assembly of a comprehensive dataset encompassing 711 data points aggregated from existing, peer-reviewed sources. The dataset encapsulated concrete mixes incorporating a variety of industrial waste powders treated equivalently based on their similar chemical and mechanical effects on the concrete matrix.
The input variables focus on nine key mix and environmental parameters: quantities of cement, fine and coarse aggregates, water content, curing age, and the amount of industrial waste SCMs.
Preprocessing ensured homogeneity by selecting data points tested under comparable curing regimes and standard compressive strength evaluation methods (e.g., ASTM standards), thereby minimizing confounding variability.
Seven supervised regression models were implemented for compressive strength prediction: Gradient Boosting Regressor (GBR), CatBoost (Categorical Boosting), Random Forest (RF), Histogram Gradient Boosting, Extra Trees Regressor (ETR), Bagging Regressor, and K-Nearest Neighbors (KNN). These models utilize ensemble learning or other advanced techniques capable of fitting complex nonlinear relationships inherent to the concrete mix of inputs.
The dataset was split into training and testing sets, with hyperparameters optimized via five-fold cross-validation to avoid overfitting and enhance model robustness. Upon final fitting, model performance was assessed using multiple statistical metrics, including the coefficient of determination (R²), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).
Critically, the study incorporated the SHAP technique to interpret the trained models, elucidating the individual impact and interaction effects of each input feature on compressive strength predictions.
Additionally, the researchers developed a user-friendly Python-based graphical user interface (GUI) to facilitate practical application, allowing construction engineers and material scientists to predict compressive strength promptly given specific mix inputs without extensive experimental validation.
Optimization and Tool Deployment
Among the evaluated machine learning models, the Gradient Boosting Regressor (GBR) excelled, achieving the highest test accuracy with R² = 0.881, RMSE = 5.65 MPa, MAE = 4.17 MPa, and MAPE = 25.15%. Other ensemble methods like CatBoost and Histogram Gradient Boosting also showed strong performance (R² > 0.87), confirming the effectiveness of ensemble learners for the nonlinear, complex domain of industrial waste-modified concrete.
SHAP analysis revealed fine aggregate content as the most influential factor on compressive strength, surpassing traditional emphasis on cement content, which plateaued beyond 400 kg/m³. Water content and curing age were also key contributors, consistent with hydration and maturation processes. Interactions between industrial waste and other components demonstrated meaningful synergistic effects, highlighting the inadequacy of simple replacement ratios.
These insights suggest optimizing sustainable concrete mixes by focusing on aggregate quality and water management alongside selective use of supplementary cementitious materials. The developed GUI translates these findings into a user-friendly tool, enabling engineers to efficiently predict and adjust mix designs, reducing costly experimental trials, and fostering sustainable practices.
Limitations include regional variability in waste properties affecting generalizability, data concentrated around typical curing ages (~28 days), and exclusion of durability aspects critical for long-term performance. Future research should broaden the scope accordingly.
Limitations and Future Scope
This study successfully applied supervised machine learning ensembles with SHAP interpretability to predict and optimize the compressive strength of sustainable concrete incorporating industrial waste. Using a large, curated dataset, Gradient Boosting Regressor emerged as the best model with high accuracy (R² = 0.881).
The findings revealed fine aggregate content as the primary strength influencer, surpassing cement content, alongside water and curing age. Significant interactions between waste materials and traditional components highlighted complex synergistic effects.
While limitations include data heterogeneity and scope, the research provides a robust framework for green concrete innovation. Future work will focus on uncertainty quantification, durability properties, and expanded datasets to improve model generalizability across different regions.
Journal Reference
AbdelMongy M., Uddin M.A., et al. (2026). Optimizing the mechanical performance of sustainable industrial waste modified concrete using supervised machine learning modeling and feature importance analysis. Scientific Reports. DOI: 10.1038/s41598-026-57625-9, https://www.nature.com/articles/s41598-026-57625-9