Researchers have developed a machine learning model that accurately predicts the compressive strength of high-strength concrete, offering a more efficient and reliable alternative to traditional estimation methods.
Study: Machine learning approaches for forecasting compressive strength of high-strength concrete. Image Credit: Gorodenkoff/Shutterstock.com
The model, detailed in a recent Scientific Reports publication, was built using Python and trained on experimental data that included key mix parameters—cement, water, silica fume, superplasticizer, sand, gravel, and curing age. These inputs were used to forecast compressive strength through a suite of regression algorithms, highlighting how machine learning can be applied to improve prediction in civil engineering contexts.
Rethinking Traditional Approaches
Conventional methods for estimating concrete strength typically rely on empirical equations or basic statistical techniques. While useful in specific scenarios, these approaches often fall short when dealing with the complex, nonlinear relationships found in high-performance concrete mixes.
To overcome these limitations, the research team turned to machine learning models that can better manage this complexity. Their goal was to develop a flexible, data-driven approach capable of improving prediction accuracy, reducing reliance on trial-and-error testing, and streamlining the concrete design process.
That said, machine learning models come with their own challenges—chief among them, sensitivity to data distribution. They tend to perform well within the bounds of the training data but can struggle with extrapolation. To address this, the researchers applied cross-validation and ensured the dataset provided adequate feature coverage to support generalizability.
Dataset and Model Training
The study used a dataset comprising 167 high-strength concrete mix records. Each entry included input variables such as cement, water, silica fume, superplasticizer, gravel, sand, and curing age, with compressive strength (measured in MPa) as the target variable.
Data preprocessing steps were crucial. Missing values were removed, features were standardized to maintain balance, and the dataset was split into 70 % training and 30 % testing sets. Normalization helped prevent any one feature from dominating others, and data shuffling eliminated potential bias from the original order.
A range of machine learning algorithms were evaluated: Linear Regression, Lasso, Ridge, Decision Trees, Random Forest, Support Vector Regression (SVR), XGBoost, and KNeighbors Regressor. Model performance was assessed using metrics like mean absolute error (MAE), mean squared error (MSE), and R-squared.
Initial statistical analysis showed a range of variability across features. Cement, water, and sand displayed moderate variation; gravel had the widest spread, and superplasticizer and curing age were relatively consistent. Interestingly, compressive strength values were tightly clustered, indicating a relatively uniform strength range across most mixes. These insights helped guide model selection and tuning.
Key Findings
Among the models tested, XGBoost consistently delivered the best predictive performance across all evaluation metrics, with Random Forest close behind. Both ensemble models significantly outperformed simpler approaches in capturing the complex relationships between input variables and compressive strength. On the other hand, linear models such as Lasso and basic Linear Regression struggled to accurately represent the underlying dynamics, especially in terms of error minimization.
To better understand how each input influenced the predictions, the researchers used SHapley Additive exPlanations (SHAP) analysis. The results highlighted curing age as the most impactful factor. This finding aligns with concrete chemistry—strength development is closely tied to cement hydration, which intensifies over time. The most significant gains occur in the early curing phase (up to 28 days), though strength can continue to increase beyond that.
Gravel, by contrast, had a relatively minor effect. While it contributes to structural stability as a coarse aggregate, it doesn't directly influence strength through chemical interaction, which likely explains its lower SHAP score.
Bridging Research and Practice
To translate these findings into a practical tool, the researchers developed an interactive graphical user interface (GUI). This user-friendly application allows engineers to input common mix parameters—such as water content, cement, and curing age—within specified ranges and instantly receive compressive strength predictions based on the trained ML model. No coding knowledge is required, making it accessible to professionals across the construction and materials sectors.
Conclusion
This study demonstrates how machine learning can provide a more accurate and adaptable method for predicting the compressive strength of high-strength concrete. Ensemble models like XGBoost and Random Forest proved especially effective in capturing complex, nonlinear relationships, outperforming traditional linear methods.
By combining technical accuracy with practical accessibility through the GUI, the research offers a meaningful step forward in concrete mix design and performance prediction. As machine learning continues to gain traction in engineering applications, tools like this have the potential to enhance design efficiency and reliability in structural projects.
Journal Reference
Shaaban, M., Amin, M., Selim, S., & Riad, I. M. (2025). Machine learning approaches for forecasting compressive strength of high-strength concrete. Scientific Reports, 15(1). DOI: 10.1038/s41598-025-10342-1. https://www.nature.com/articles/s41598-025-10342-1
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.