{"id":170681,"date":"2025-08-24T02:55:15","date_gmt":"2025-08-24T02:55:15","guid":{"rendered":"https:\/\/www.europesays.com\/us\/170681\/"},"modified":"2025-08-24T02:55:15","modified_gmt":"2025-08-24T02:55:15","slug":"knowledge-embedding-and-interpretable-machine-learning-optimize-comprehensive-benefits-for-water-treatment","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/us\/170681\/","title":{"rendered":"Knowledge embedding and interpretable machine learning optimize comprehensive benefits for water treatment"},"content":{"rendered":"<p>Flocculation dynamics and knowledge embedding<\/p>\n<p>There is a quantitative relationship between flocculants and granules, In flocculation dynamics, the forces driving particle collisions come from two sources: (1) particle collision and aggregation caused by Brownian motion is called perikinetic flocculation (Text <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S1<\/a>); (2) particle collision and aggregation caused by fluid motion from hydraulic or mechanical agitation is called orthokinetic flocculation (Text <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S2<\/a>). The detailed process is described in Text <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S1<\/a>, Text <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S2<\/a>. Although perikinetic flocculation itself is not affected by particle size, as particle size increases, the effect of Brownian motion diminishes. To further promote collision and aggregation of larger particles, orthokinetic flocculation is also required. For real water bodies, where perikinetic flocculant and orthokinetic flocculant coexist, the ratio of different collision types is uncertain. In addition, the particle size (d) and collision efficiency (\\(\\eta\\)) also vary with changes in water quality (Text <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S1<\/a>, Text <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S2<\/a>). Therefore, the accurate dosing amount cannot be described by a quantitative formula, but is more suitable for learning the nonlinear relationship through ML and the types of data collected need to include water quality indicators for different sections of the full process.<\/p>\n<p>How to embed environmental knowledge into the model and thus increase the interpretability of the model is key to the research. The study also considered the economic benefits and energy expenditure. The logical chain of the flocculation process is, addition of flocculant \u279d change in kinetic characteristics \u279d particle flocculation \u279d change in water quality of the flocculation section \u279d change in subsequent water quality \u279d change in water quality of the effluent, and the related economic indicators. Thus, 7 economically relevant indicators (4 electricity indicators and 3 economic indicators recorded by the central server were used as constraints rather than independent variables to optimize the model parameters for use in the training phase, while in the inference and application phases these indicators have been embedded as knowledge in the model (Eq.<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"equation anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Equ1\" target=\"_blank\" rel=\"noopener\">1<\/a>, Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1a<\/a>). Similarly, the metrics before the flocculation segment at timen and after the flocculation segment at timen-1 are used as independent variables, and the metrics after the flocculation segment at timen are used as constraints. The change value features are calculated in real time by the central server along with the results of online monitoring.<\/p>\n<p>$$\\begin{array}{c}\\begin{array}{cc}{Train}: &amp; {PAC}={ML}\\left({WIBC}\\right)* {MLP}\\end{array}\\\\ {MLP}\\leftarrow {a}_{1}{Loss}({WIBC})+{a}_{2}{Loss}({WTIAC})+{a}_{3}{Loss}({PTIAC})+{a}_{4}{Loss}({ETIAC})\\\\ \\begin{array}{cc}{Test}: &amp; {PAC}={ML}\\left({{WIBC}}_{{Test}}\\right)* {MLP}\\end{array}\\end{array}$$<\/p>\n<p>\n                    (1)\n                <\/p>\n<p>\\({ML}\\) is different machine learning models<\/p>\n<p>\\({MLP}\\) is different machine learning model parameters<\/p>\n<p>\\({WIBC}\\) is water quality indicators before coagulation<\/p>\n<p>\\({WTIAC}\\) is water quality threshold indicators after coagulation<\/p>\n<p>\\({PTIAC}\\) is power threshold indicators after coagulation<\/p>\n<p>\\({ETIAC}\\) is economic threshold indicators after coagulation<\/p>\n<p>With the distinction between independent variables and constraints, and then the knowledge of environmental science embedded in the model training process, it is possible to clearly set the problem framework, optimize the calculation process, and improve the efficiency and accuracy of the model. As for the threshold control after the flocculation section, after the disinfection section, and before leaving the plant, the evaluation standard is the closer to the threshold, the better, rather than the lower and lower water quality, the better. Because of the need for water quality, economic costs, and environmental benefits of a comprehensive balance. Finally, an interpretable analysis of the constructed model is performed along with application validation.<\/p>\n<p>Treatment process and data collection<\/p>\n<p>The DWTP from which the data was obtained is located in Guangzhou, China. The design water supply capacity is 800,000 tons per day, the actual water supply capacity is 450,000 tons per day, and the total construction land area is about 180,000\u2009m2. The plant adopts the chlorine disinfection process of pre-chlorination, 3 sodium hypochlorite at the water intake, the disinfection contact pool, and the water delivery pumping station before (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1a, b<\/a>). Coagulation chemicals used PAC (Polymerized Aluminum Chloride), the main parameters of PAC are shown in Text <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S3<\/a>. China\u2019s current water quality standards for water leaving the plant is GB5749-2022, the water plant to implement more stringent control standards, specifically after the sedimentation tank controls turbidity less than 0.8NTU, after the disinfection tank controls turbidity less than 0.5NTU, the residual chlorine is greater than 0.3\u2009mg\/L, the effluent control turbidity is less than 0.3NTU, the residual chlorine is greater than 0.8\u2009mg\/L.<\/p>\n<p>To improve the feasibility of the subsequent design, the study selected 38 indicators that can be measured online at most water plants (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1a<\/a>). These included 7 raw water indicators,5 influent water indicators, 4 pre-chlorination indicators, 3 sedimentation tank indicators, 3 disinfection indicators, 4 effluent water indicators, 4 electricity indicators and 3 economic indicators recorded by central server, 4 change value indicators by independently designing and calculating, as well as the dosage of PAC (Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S1<\/a>). The study provided instrument conFig.urations, installation locations, quantities and categories (Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S2<\/a>). The dataset is divided with a ratio of 8 to 2 between the training set and the test set.<\/p>\n<p>ML principles<\/p>\n<p>For the feasibility of the ML approach, eight ML algorithms were selected for the study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Sun, Y. et al. Application of remote sensing technology in water quality monitoring: From traditional approaches to artificial intelligence. Water Res. 267, 122546 (2024).\" href=\"#ref-CR29\" id=\"ref-link-section-d216154803e1988\">29<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Liu, W., Chen, J., Wang, H. &amp; Fu, Z. Perspectives on Advancing Multimodal Learning in Environmental Science and Engineering Studies. Environ. Sci. Technol. 58, 16690&#x2013;16703 (2024).\" href=\"#ref-CR30\" id=\"ref-link-section-d216154803e1988_1\">30<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Bibri, S. E., Huang, J., Jagatheesaperumal, S. K. &amp; Krogstie, J. The synergistic interplay of artificial intelligence and digital twin in environmentally planning sustainable smart cities: A comprehensive systematic review. Environ. Sci. Ecotechnology 20, 100433 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR31\" id=\"ref-link-section-d216154803e1991\" target=\"_blank\" rel=\"noopener\">31<\/a> (Text <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#MOESM1\" target=\"_blank\" rel=\"noopener\">S4<\/a>). It includes Ridge Regression (RIDGE), Support Vector Regression (SVR), Random Forest (RF), Extreme Gradient Boosting (XG), Deep Neural Networks (DNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Transformer (TF). RIDGE is the baseline model.<\/p>\n<p>RIDGE is a method to improve the stability and predictive power of linear regression by regularizing the penalty term (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1c<\/a>). SVR fits linear and nonlinear data via support vector machines (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1d<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Xia, Y. et al. Understanding the Disparities of PM2.5 Air Pollution in Urban Areas via Deep Support Vector Regression. Environ. Sci. Technol. 58, 8404&#x2013;8416 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR32\" id=\"ref-link-section-d216154803e2007\" target=\"_blank\" rel=\"noopener\">32<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"Dai, Y. et al. Prediction of water quality based on SVR by fluorescence excitation-emission matrix and UV&#x2013;Vis absorption spectrum. Spectrochim. Acta Part A 273, 121059 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR33\" id=\"ref-link-section-d216154803e2010\" target=\"_blank\" rel=\"noopener\">33<\/a>. RF is an ensemble learning based on bagging, the subtrees are independent and do not affect each other (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1e<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Tesoriero, A. J., Wherry, S. A., Dupuy, D. I. &amp; Johnson, T. D. Predicting Redox Conditions in Groundwater at a National Scale Using Random Forest Classification. Environ. Sci. Technol. acs.est.3c07576 &#10;                  https:\/\/doi.org\/10.1021\/acs.est.3c07576&#10;                  &#10;                 (2024).\" href=\"#ref-CR34\" id=\"ref-link-section-d216154803e2017\">34<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Song, M. J. et al. Identification of primary effecters of N2O emissions from full-scale biological nitrogen removal systems using random forest approach. Water Res. 184, (2020).\" href=\"#ref-CR35\" id=\"ref-link-section-d216154803e2017_1\">35<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"Hu, X. et al. Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach. Environ. Sci. Technol. 51, 6936&#x2013;6944 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR36\" id=\"ref-link-section-d216154803e2020\" target=\"_blank\" rel=\"noopener\">36<\/a>. XG is also an ensemble learning, but it\u2019s based on boosting and the subtrees are dependent on each other (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1f<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Villanueva, P. et al. One-Week-Ahead Prediction of Cyanobacterial Harmful Algal Blooms in Iowa Lakes. Environ. Sci. Technol. &#10;                  https:\/\/doi.org\/10.1021\/acs.est.3c07764&#10;                  &#10;                 (2023).\" href=\"#ref-CR37\" id=\"ref-link-section-d216154803e2028\">37<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Pan, Y. et al. Machine Learning-Assisted Optimization of Mixed Carbon Source Compositions for High-Performance Denitrification. Environ. Sci. Technol. 58, 12498&#x2013;12508 (2024).\" href=\"#ref-CR38\" id=\"ref-link-section-d216154803e2028_1\">38<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 39\" title=\"Li, L. et al. Interpretable tree-based ensemble model for predicting beach water quality. Water Res. 211, 118078 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR39\" id=\"ref-link-section-d216154803e2031\" target=\"_blank\" rel=\"noopener\">39<\/a>. DNN consists of multiple layers of neurons and are the simplest deep learning networks (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1g<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Li, X. et al. Accurately Predicting Spatiotemporal Variations of Near-Surface Nitrous Acid (HONO) Based on a Deep Learning Approach. Environ. Sci. Technol. 58, 13035&#x2013;13046 (2024).\" href=\"#ref-CR40\" id=\"ref-link-section-d216154803e2038\">40<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Ma, P. et al. Early Detection of Pipeline Natural Gas Leakage from Hyperspectral Imaging by Vegetation Indicators and Deep Neural Networks. Environ. Sci. Technol. 58, 12018&#x2013;12027 (2024).\" href=\"#ref-CR41\" id=\"ref-link-section-d216154803e2038_1\">41<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\" title=\"Zhang, D. et al. An optical mechanism-based deep learning approach for deriving water trophic state of China&#x2019;s lakes from Landsat images. Water Res. 252, 121181 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR42\" id=\"ref-link-section-d216154803e2041\" target=\"_blank\" rel=\"noopener\">42<\/a>. RNN captures time dependence in temporal data through cyclic connectivity, but may face gradient vanishing during training (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1h<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Yu, Y., Si, X., Hu, C. &amp; Zhang, J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput. 31, 1235&#x2013;1270 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR43\" id=\"ref-link-section-d216154803e2048\" target=\"_blank\" rel=\"noopener\">43<\/a>. LSTM controls the flow of information by introducing a gating mechanism that allows the network to selectively retain important information and forget irrelevant information (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1i<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Nasser, A. A., Rashad, M. Z. &amp; Hussein, S. E. A Two-Layer Water Demand Prediction System in Urban Areas Based on Micro-Services and LSTM Neural Networks. IEEE Access 8, 147647&#x2013;147661 (2020).\" href=\"#ref-CR44\" id=\"ref-link-section-d216154803e2056\">44<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Liu, R., Zayed, T. &amp; Xiao, R. Advanced acoustic leak detection in water distribution networks using integrated generative model. Water Res. 254, 121434 (2024).\" href=\"#ref-CR45\" id=\"ref-link-section-d216154803e2056_1\">45<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Xie, Y. A hybrid deep learning approach to improve real-time effluent quality prediction in wastewater treatment plant. Water Res. &#010;                  https:\/\/doi.org\/10.1016\/j.watres.2023.121092&#010;                  &#010;                 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR46\" id=\"ref-link-section-d216154803e2059\" target=\"_blank\" rel=\"noopener\">46<\/a>. TF captures global information through a self-attention mechanism that allows each input element to be computed in association with all other elements in the sequence (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#Fig1\" target=\"_blank\" rel=\"noopener\">1j<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Baid, G. et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nat. Biotechnol. 41, 232&#x2013;238 (2022).\" href=\"#ref-CR47\" id=\"ref-link-section-d216154803e2066\">47<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Stebliankin, V. et al. Evaluating protein binding interfaces with transformer networks. Nat. Mach. Intell. 5, 1042&#x2013;1053 (2023).\" href=\"#ref-CR48\" id=\"ref-link-section-d216154803e2066_1\">48<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Kang, Y., Park, H., Smit, B. &amp; Kim, J. A multi-modal pre-training transformer for universal transfer learning in metal&#x2013;organic frameworks. Nat. Mach. Intell. 5, 309&#x2013;318 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR49\" id=\"ref-link-section-d216154803e2069\" target=\"_blank\" rel=\"noopener\">49<\/a>.<\/p>\n<p>Model interpretability<\/p>\n<p>Shapley Additive Explanations (SHAP) is an explanatory method based on game theory for measuring the contribution of each feature to the prediction results of a ML model<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Loh, H. W. et al. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011&#x2013;2022). Computer Methods Prog. Biomedicine 226, 107161 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR50\" id=\"ref-link-section-d216154803e2081\" target=\"_blank\" rel=\"noopener\">50<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Luo, Z.-N. et al. Enhanced iodinated disinfection byproducts formation in iodide\/iodate-containing water undergoing UV-chloramine sequential disinfection: machine learning-aided identification of reaction mechanisms. Water Res. 122975 &#010;                  https:\/\/doi.org\/10.1016\/j.watres.2024.122975&#010;                  &#010;                 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR51\" id=\"ref-link-section-d216154803e2084\" target=\"_blank\" rel=\"noopener\">51<\/a>. It draws on the concept of Shapley value to derive the contribution value of each feature by calculating the marginal contribution of the features to the prediction in different combinations<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Hu, L. et al. Informing Risk Hotspots and Critical Mitigations for Rainstorms Using Machine Learning: Evidence from 268 Chinese Cities. Environ. Sci. Technol. 59, 1619&#x2013;1630 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR52\" id=\"ref-link-section-d216154803e2088\" target=\"_blank\" rel=\"noopener\">52<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Wang, Y. et al. Predictions of the Optical Properties of Brown Carbon Aerosol by Machine Learning with Typical Chromophores. Environ. Sci. Technol. 58, 20588&#x2013;20597 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR53\" id=\"ref-link-section-d216154803e2091\" target=\"_blank\" rel=\"noopener\">53<\/a>.The advantage of SHAP is that it provides transparency in model prediction, reveals the specific impact of features on the results, and helps to understand the decision-making process of complex models<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\" title=\"Li, X. et al. Accurately Predicting Spatiotemporal Variations of Near-Surface Nitrous Acid (HONO) Based on a Deep Learning Approach. Environ. Sci. Technol. 58, 13035&#x2013;13046 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR40\" id=\"ref-link-section-d216154803e2095\" target=\"_blank\" rel=\"noopener\">40<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Lundberg, S. M. &amp; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. NIPS 2017. 30, 4766&#x2013;4777 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41545-025-00510-1#ref-CR54\" id=\"ref-link-section-d216154803e2098\" target=\"_blank\" rel=\"noopener\">54<\/a>. During model development, SHAP can help developers identify potential errors or unfair factors and avoid model bias.SHAP is used to understand and validate the ML dosing model in this study.<\/p>\n<p>Subtree depths in Random Forest (RF) can increase the interpretation and understanding of the model. Subtree depth directly affects model complexity, generalization ability, and computational efficiency: trees with greater depth may capture complex patterns but are prone to overfitting, while trees with less depth are simpler and easier to interpret. By adjusting the depth, model performance and interpretability can be balanced to help diagnose overfitting or underfitting problems. In addition, subtree depth affects feature importance analysis and the transparency of decision paths, making the model decision process easier to understand and communicate.<\/p>\n<p>Application feasibility verification<\/p>\n<p>DWTP adopts folded plate flocculation tanks, and the flocculation area is divided into 2 blocks, a total of 8 groups, the length, width and height of each group are 13.75\u2009m, 14.40\u2009m and 7.1\u2009m respectively. Each group of flocculation tank is divided into 3 rows, 2 columns and 6 areas, and the total number of folded plates is 38 blocks. The model constructed during the validation was scaled down 25 times, and the length, width and height of one group were 55\u2009cm, 57.6\u2009cm and 28.4\u2009cm respectively. The designed hydraulic retention time of the flocculation area was 15.1\u2009min, and the designed hydraulic retention time of the sedimentation area was 102\u2009min.<\/p>\n<p>The validation will take place from February 4, 2025 to February 13, 2025, a total of 10 days. There were four sampling time points, 10:00, 12:00, 14:00, and 16:00 daily.The validation was done by directly taking the water before coagulation, calculating the dosing scheme for the original logic of the water plant and the ML-driven dosing scheme, respectively, and mixing the water and the pharmaceuticals through the pump. The water was then fed into two identical downsized flocculation reactors and the coagulated water was left to stand for 102\u2009min to test the indicators. Considering the differences between the reactors and the real water plant, the indicators of the real water plant during the reactor validation were also collected online and used for comparison. All tanks appear in pairs, so separate controls are performed to verify the migratability of the method.<\/p>\n<p>Technical economic analysis (TEA) assesses the economic feasibility, benefits and risks of technologies through quantitative and qualitative analyses, thus providing a scientific basis for decision-making. This study mainly considers the cost of chemical consumption, which mainly includes flocculant (PAC), disinfectant (sodium hypochlorite) and other agents. Other chemicals mainly refer to hydrochloric acid and sodium hydroxide used in backwashing to adjust the pH.<\/p>\n<p>Monte Carlo simulation is an effective numerical method for addressing uncertainty issues, offering advantages such as simplicity of implementation, strong reproducibility, and applicability to perturbation analysis of complex systems. This study introduces this method to assess the robustness of the model under data missing conditions. Specifically, for each specified missing proportion, a corresponding proportion of feature values are randomly selected from the original input data and set to 0 to simulate real-world monitoring anomalies such as sensor failures, communication interruptions, or missing data records. Subsequently, the model is run 100 times based on the disturbed data, and the error from each simulation is recorded. Through statistical analysis of the results from multiple simulations, the performance fluctuations and stability of the model under different missing data ratios can be quantified.<\/p>\n","protected":false},"excerpt":{"rendered":"Flocculation dynamics and knowledge embedding There is a quantitative relationship between flocculants and granules, In flocculation dynamics, the&hellip;\n","protected":false},"author":3,"featured_media":170682,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[691,738,74290,746,81664,834,26986,158,67,132,68,97578,97577,97576],"class_list":{"0":"post-170681","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-civil-engineering","11":"tag-environment","12":"tag-environmental-sciences","13":"tag-general","14":"tag-nanotechnology","15":"tag-technology","16":"tag-united-states","17":"tag-unitedstates","18":"tag-us","19":"tag-waste-water-technology-water-pollution-control-water-management-aquatic-pollution","20":"tag-water-industry-water-technologies","21":"tag-water-quality-water-pollution"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@us\/115081492762793363","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/170681","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/comments?post=170681"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/170681\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media\/170682"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media?parent=170681"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/categories?post=170681"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/tags?post=170681"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}