The methods framework, as illustrated in Fig. 6, consists of four steps designed to quantify the net electricity saving impact of AI on the power system. First, we conducted a meta-analysis of existing literature on AI applications across the source-grid-load-storage segments to establish intervals for electricity saving potential. Second, these intervals serve as the empirical basis for defining three AI electricity saving scenarios, which are then used to calculate total electricity savings. Third, the electricity consumption of AI itself is estimated based on the technical parameters of AIDCs under varying demand scenarios. Finally, by cross-combining the three electricity saving scenarios with the three electricity consumption scenarios, we constructed a 3×3 matrix of integrated scenarios and calculated the net electricity saving trend from 2025 to 2060.

Fig. 6: Methodological framework of the study.Fig. 6: Methodological framework of the study.The alternative text for this image may have been generated using AI.

It outlines a four-step process: meta-analysis for metric extraction (Step 1), scenario construction for electricity savings (Step 2), scenario construction for electricity consumption (Step 3), and trend forecasting through combined scenarios for 2025–2060 (Step 4).

Study object

To precisely evaluate the heterogeneous impacts of AI technologies on electricity saving across different power system, this study employs a multi-case comparative design. Five provinces, namely Inner Mongolia, Jiangsu, Guangdong, Sichuan, and Shandong, are selected as representative cases due to their diversity in energy endowment, load characteristics, and power system structures.

Energy endowment

The five regions demonstrate distinct supply-side foundations: Sichuan is dominated by hydropower, Shandong is experiencing a rapid transition from coal to clean energy, and Inner Mongolia is rich in wind-solar resources. Jiangsu and Guangdong, being coastal provinces with limited local resources, rely more heavily on inter-provincial electricity imports.

Load characteristics

Typical load profiles are also distinguished: Shandong is characterized by industry-intensive and high-baseload demand; Guangdong demonstrates service-oriented and highly urbanized consumption patterns; Sichuan displays strong seasonality because of its climate; Inner Mongolia features a low-load-density structure; and Jiangsu is marked by stable manufacturing-service mixed demand.

Power system structures

These provinces, corresponding to key national energy clusters—including the Three-North renewable base (Inner Mongolia), the Yangtze River Delta load center (Jiangsu), the Pearl River Delta digital economy hub (Guangdong), and the southwestern hydropower corridor (Sichuan)—are core nodes in China’s “source-grid-load-storage” integration pilots.

By comparing regions with distinct supply-demand structures and varying technological adoption levels, this study assesses how the electricity saving potential of AI varies according to regional characteristics such as renewable energy penetration, load flexibility, and grid complexity. The analysis offers empirical bases for formulating differentiated AI deployment strategies and electricity consumption control policies.

Meta-analysis

To quantify the electricity saving effects of AI technologies across different segments of the power system, a meta-analysis54,55,56,57,58 was carried out to synthesize the reported effectiveness values from the literature. The aim is to obtain a unified electricity saving OE for each segment, namely source, grid, load, and storage, which is then employed as the core parameter in the provincial net electricity saving assessment model. In this study, the OE represents the synthesized electricity saving effect of AI applications in the power system. Specifically, OE is obtained by aggregating electricity saving rates reported in individual studies using a standard meta-analysis approach, where each study is weighted according to its variance and sample size. This enables a unified quantitative estimate of the electricity saving potential of AI across heterogeneous applications. The meta-analysis procedure involves literature collection, classification of AI applications across the source-grid-load-storage segments, standardization of OE, heterogeneity assessment, pooling via random or fixed effects models, robustness testing, and bias adjustment. The specific steps are as follows.

Step 1: Literature retrieval and classification of source-grid-load-storage

A comprehensive and systematic search was carried out in major databases, namely Web of Science, IEEE Xplore, and CNKI, spanning the time frame from 2014 to 2024. The keyword combinations employed were “Artificial Intelligence”, “machine learning”, “deep learning”, “power system”, “source/grid/load/storage”, along with “energy saving” or “electricity saving”. Eligible studies were required to meet two specific criteria: First, they had to quantitatively report the electricity saving effects induced by AI, which could be presented either as a percentage reduction in electricity consumption (%) or a quantified reduction in TWh/year that was convertible into a percentage. Supplementary Fig. 6 shows the primary application pathways of AI in power system identified in this study, which serve as the foundation for developing the OE indicators employed in the subsequent meta-analysis. Second, the context of the studies must be related to China’s power system or be reasonably applicable to Chinese conditions. For each eligible study, the following data items were extracted: the type of AI algorithm, the application segment (source/grid/load/storage), the reported electricity saving effect (%), and the scenario description along with experimental conditions. The PRISMA59 procedures were adhered to for screening, which included title-abstract filtering and full-text evaluation (Supplementary Fig. 7).

Subsequently, the literature was classified into four categories based on the segment of the AI application (Table 1). For each literature, the quantitative index of AI’s electricity saving effect will be extracted, which is defined as \({Y}_{i}\).

Table 1 Specific practices of applying AI in source-grid-load-storage of power systemStep 2: Heterogeneity assessment

The purpose of the heterogeneity test is to determine whether a Fixed Effects Model (FEM) or Random Effects Model (REM) should be used. In this context, consistency tests must be conducted separately for the source, grid, load, and storage segments. First, the null hypothesis \({H}_{0}\) is established, which assumes that the true electricity saving OE is identical across all studies — that is, all \({Y}_{i}\) values are equal. Subsequently, the Cochran’s Q statistic60 is calculated by Eqs. (1)–(4).

$$Q=\sum {\omega }_{i}{\left({Y}_{i}-\hat{\theta }\right)}^{2}$$

(1)

$${\omega }_{i}=1/{\nu }_{i}$$

(2)

$${\nu }_{i}=\frac{{Y}_{i}\left(1-{Y}_{i}\right)}{{n}_{i}}$$

(3)

$$\hat{\theta }=\frac{\sum {\omega }_{i}{Y}_{i}}{\sum {\omega }_{i}}$$

(4)

where \({Y}_{i}\) is the OE of the i th item study, \({\omega }_{i}\) is the weight of the i th item study of the FEM, \({\nu }_{i}\) is the variance within the study, \({n}_{i}\) is the sample size of the item study, and \(\hat{\theta }\) is the weighted mean within the study.

If \(Q > {\chi }_{0.05,k-1}^{2}\), the hypothesis \({H}_{0}\) is rejected, indicating the presence of heterogeneity among the studies.

Furthermore, the \({I}^{2}\) statistic must be computed to assess the extent of heterogeneity across the included studies by Eq. (5).

$${I}^{2}=\frac{Q-\left(k-1\right)}{Q}\times 100 \%$$

(5)

The criteria for heterogeneity judgment are shown in Table 2. If heterogeneity is statistically obvious, a REM will be employed; otherwise, a FEM will be applied.

Table 2 Criteria for heterogeneity judgment

Finally, the potential OE range for future research can be obtained by Eq. (6).

$${PI}={\hat{\theta }}_{{RE}}\pm {t}_{0.025,k-2}\times \sqrt{{\tau }^{2}+S{E}^{2}}$$

(6)

where \({\tau }^{2}\) represents the variance between studies estimated using the DerSimonian-Laird method61.

Step 3: Model selection and the combination of OE in different studies

When heterogeneity is not obvious, the FEM model is adopted, as shown in Eqs. (7)-(8).

$${\hat{\theta }}_{{FE}}=\frac{\sum {\omega }_{i}{Y}_{i}}{\sum {\omega }_{i}}$$

(7)

$$O{E}_{{FE}}=\sqrt{\frac{1}{\sum {\omega }_{i}}}$$

(8)

When heterogeneity is obvious, the DerSimonian-Laird method is used to estimate the variance \({\tau }^{2}\) among the studies, as shown in Eq. (9).

$${\tau }^{2}=\max \left(0,\frac{Q-\left(k-1\right)}{\sum {\omega }_{i}-\frac{\sum {\omega }_{i}^{2}}{\sum {\omega }_{i}}}\right)$$

(9)

The adjusted random weights and the final OE are as shown in Eqs. (10)-(12).

$${\omega }_{i}^{* }=\frac{1}{{\nu }_{i}+{\tau }^{2}}$$

(10)

$${\hat{\theta }}_{{RE}}=\frac{\sum {\omega }_{i}^{* }{Y}_{i}}{\sum {\omega }_{i}^{* }}$$

(11)

$$O{E}_{{RE}}=\sqrt{\frac{1}{\sum {\omega }_{i}^{* }}}$$

(12)

After the combination, the final electricity saving rate and its 95% confidence interval were obtained through inverse transformation. The calculation of the confidence interval is presented in Eq. (13).

$${CI}=\hat{\theta }\pm {Z}_{\alpha /2}\times {OE}\left(\hat{\theta }\right)$$

(13)

where \({Z}_{\alpha /2}\) represents the quantile of the standard normal distribution, \({OE}\left(\hat{\theta }\right)\) is the standard error of the weighted mean.

Step 4: Robustness analysis and bias correction

First, the leave-one-out method is applied to each study iteratively, and the OE is recalculated after each removal. If the maximum fluctuation range \(\varDelta \hat{\theta } < 10 \%\), the results are considered robust.

Second, when the sample size is less than 10, perform Egger’s test as shown in Eq. (14).

$$\frac{{Y}_{i}}{{{OE}}_{i}}={\beta }_{0}+{\beta }_{1}\frac{1}{{{OE}}_{i}}+{\varepsilon }_{i}$$

(14)

If \({\beta }_{1}\ne 0\,\left(p < 0.1\right)\), it indicates the existence of small sample effect bias.

Finally, the trim-and-fill method was used to estimate the number of missing studies (\({k}_{{\mbox{missing}}}\)), and the OE was recalculated after the data were completed to assess the direction and magnitude of the bias.

Assessment of the electricity saving potential of AI across source-grid-load-storage

Following the meta-analysis, we obtained the electricity saving effects of each practice across the source-grid-load-storage segments. These four types of effect magnitudes serve as core input parameters for the subsequent model, which is used to simulate the electricity saving potential of AI technology in five provinces. Then, three electricity saving scenarios—Efficient, Typical, and Inefficient—are formulated. Specifically, the Efficient scenario adopts the upper quartile of OE, the Typical scenario uses the median of OE, and the Inefficient scenario employs the lower quartile of OE62.

In particular, for the source and the load segments, AI applications on the source segment have a substantial impact on the electricity saving outcomes of renewable energy forecasting. However, since most studies quantify the electricity saving outcomes through prediction accuracy rates, we employ data from an actual case to determine the electricity saving rate value and set 11.5% as the OE63. Likewise, for BESS lifetime optimization on the storage segment, indicators such as battery health status are commonly used. Thus, based on an actual case, we select 25% as the OE64.

Furthermore, the total electricity saving contribution of the source-grid-load-storage is employed to validate the Pareto principle. Specifically, we sorted the electricity savings from each segment of AI practice in descending order and calculated the cumulative contribution rate. If the contribution of the top 20% of applications accounts for 80% or more of the total electricity saving, then the Pareto principle can be verified65.

Assessment of electricity consumption from AI servers

We estimate the annual electricity consumption attributable to AI servers by utilizing a bottom-up stock-flow model34,66. Let \({\mbox{Shi}}{{\mbox{p}}}_{t}\) denote the number of AI servers shipped (newly deployed) in year \(t\). Each server is classified as either a training-type or an inference-type server in accordance with year-specific proportions \({s}_{t}^{{\mbox{train}}}\) and \({s}_{t}^{\inf }\) (\({s}_{t}^{{\mbox{train}}}+{s}_{t}^{\inf }\)). Servers remain in the active stock for a lifetime \(L\) (assumed to be 5 years67). Therefore, the number of active servers of each type in year \(t\) is the sum of shipments in the preceding \(L\) years. The number of different types of servers each year is presented as shown in Eq. (15).

$${N}_{t}^{{{\rm{type}}}}={\sum }_{i=t-L+1}^{t}{{{\rm{Ship}}}}_{i}\times {s}_{i}^{{{\rm{type}}}}$$

(15)

where type ∈{train, inf}.

The annual electricity consumption of IT equipment is calculated by multiplying the active stock by the average power per server and the effective operating hours, as shown in Eq. (16).

$${E}_{t}^{{\mbox{IT}}}={N}_{t}^{{\mbox{train}}}\cdot {P}_{t}^{{\mbox{train}}}\cdot {u}^{{\mbox{train}}}\cdot h+{N}_{t}^{\inf }\cdot {P}_{t}^{\inf }\cdot {u}^{\inf }\cdot h$$

(16)

where \({P}_{t}^{{\mbox{train}}}\) and \({P}_{t}^{\inf }\) are the average active power (kW) of a training and inference server respectively, \({u}^{{\mbox{train}}}\) and \({u}^{\inf }\) are utilization rates (0-1) representing the fraction of time servers are actually performing compute work, and \(h\) is the number of hours per year (we recommend \(h=8760\) hours, if a 360-day calendar is used, state \(h=8640\) with justification).

To account for data center overheads (such as cooling, power conversion losses, and lighting)68, we multiply the consumption of IT equipment by the PUE using Eq. (17).

$${E}_{{\mbox{consumption}}}={E}_{t}^{{\mbox{IT}}}\times {\mbox{PU}}{{\mbox{E}}}_{t}$$

(17)

Per-server power values \({P}^{{\mbox{type}}}\) should be interpreted as average operational power rather than peak Thermal Design Power (TDP). Utilization rates are determined based on reports from cloud operators and relevant literature34,69.

These formulations allow the estimation of the effective electricity demand driven by the deployment and operation of AI servers across China, accounting for both operational characteristics and infrastructure efficiency.

Design of the net electricity saving assessment framework

To capture the dynamic balance between the electricity saving enabled by AI and the additional electricity consumption induced by AI deployment, this study develops a net electricity saving assessment framework. It integrates the electricity saving effects generated by AI applications across the source-grid-load-storage with the electricity consumption of AI servers, which represent the main sources of incremental electricity demand brought by large-scale AI development.

It should be noted that this study focuses exclusively on: (1) the electricity saving achieved by AI technologies applied within the power system; and (2) the electricity consumption of AI servers used for training and inference.

Electricity consumption from lightweight or conventional AI applications is excluded, as their energy use is relatively small, highly dispersed, and difficult to quantify at the system level.

The calculation of the net electricity saving indicator is defined as Eq. (18).

$${{\mbox{Net}}}\, {{\mbox{electricity}}}\, {{\mbox{saving}}}=\frac{{E}_{{\mbox{saving}}}}{{E}_{{\mbox{consumption}}}}\times 100 \%$$

(18)

where \({E}_{{\mbox{saving}}}\) denotes the total electricity saving enabled by AI applications across the power system, and \({E}_{{\mbox{consumption}}}\) represents the total electricity consumption of AI servers.

When \({\mbox{Net electricity saving}}\ge 100 \%\), AI is considered to achieve net electricity saving, indicating that the total electricity saving meets or exceeds its own electricity consumption. The model is further used to identify the critical transition point from net consumption to net saving and to simulate the trajectories of the net electricity saving under different policy, technology, and penetration scenarios.

Scenario setting

To evaluate the dynamic interaction between the electricity consumption of AI servers and the electricity saving enabled by AI across the power system, this study constructs two groups of scenario sets: (1) scenarios for AI electricity consumption, and (2) scenarios for AI electricity saving. All parameters adopt differentiated short-term (2025-2030), medium-term (2031-2040), and long-term (2041-2060) trajectories, reflecting uncertainties in technological and policy evolutions. The detailed parameter settings are provided in Supplementary Tables 12.

Scenarios for AI electricity consumption

We considered four dimensions that affect the electricity consumption of AI: (1) AI server deployment growth rate, (2) the composition and power of training/inference servers, (3) PUE evolution trends, and (4) chip energy efficiency factors. For each dimension, parameter ranges were derived from industry forecasts34, academic literature66, and technical reports33, and subsequently differentiated across the TD, PC, and ID scenarios.

AI server deployment growth rate

The growth rates are informed by the forecasts of China’s AI server shipments from the International Data Corporation (IDC) and the China Academy of Information and Communications Technology (CAICT)33,37. These forecasts estimate a Compound Annual Growth Rate (CAGR) of 19-27% for the period from 2025 to 2030, which is driven by the adoption of generative AI. Additionally, they are also influenced by the long-term computility demand projections from scaling-law literature70,71. Specifically, TD adopts the upper bounds of these projections (20-25% in 2025–2030; 9–11% in 2031–2040; 4–6% in 2041-2060), which reflects the rapid diffusion of high-performance AI workloads. PC uses moderate values (15–20%, 7–9%, 2–4%), assuming that there are policy caps on capacity expansion and grid constraints. ID follows baseline market growth (10–15%, 5–7%, 1–2%), which is consistent with the slowdown trend observed in the scale of demand for general-purpose computility72.

Composition of training and inference servers and server power

Forecasts from cloud operators and NVIDIA workload studies suggest that inference workloads are increasing rapidly and will dominate the future demand for AI computility73,74. Meanwhile, the evolution of GPU TDP (from V100 to A100 to H100 to B100) shows a historical CAGR of 6–9%75,76. Specifically, TD assumes a rapid expansion of inference (with an annual increase of 6% from 2025 to 2030) and a moderate power CAGR (4.3%), which reflects the gains in technological efficiency. PC restricts the growth of training servers through regulatory constraints (with a 5% increase in inference), but requires low-power chips, resulting in a 6.2% growth in power. ID maintains the historical workload structure (with a 4% increase in inference) and follows the observed historical GPU power CAGR of 8.6%69.

PUE evolution trends

The PUE assumptions are based on the Uptime Institute global surveys and China’s national data center efficiency standards69,77. Currently, the average PUE in China ranges from 1.25 to 1.35, and liquid-cooling hyperscale facilities have already achieved a PUE of 1.10-1.1578. Specifically, TD follows the technological frontier (PUE will reach 1.10 by 2060). PC adheres to the mandated efficiency standards (PUE will reach 1.20 by 2060). ID reflects natural saturation without intervention (PUE will reach 1.30 by 2060).

Chip energy efficiency factors

Based on hardware efficiency improvements in relevant reports and studies, annualized efficiency gains are typically within the range of 5% to 25%66,79. Specifically, TD adopts 20% efficiency improvement, which represents the upper-bound of hardware enhancement. PC assumes moderate improvements (10–15%), which are supported by policy incentives. ID uses conservative gains (8%), which are consistent with the passive adoption of mature technology.

Scenarios for AI electricity saving

To simulate the evolution of electricity saving impacts resulting from AI applications in power system, three distinct scenarios—Efficient, Typical, and Inefficient—have been established, each representing a unique technological development pathway. The preceding section introduced these scenarios in a brief. This subsection provides a detailed exposition of the quantification and parameterization methods applied to each scenario, along with the empirical evidence supporting their formulation.

First, based on a meta-analysis, the quantified electricity saving ranges for different application practices were determined, as presented in Table 2. From the literature collected in meta-analysis, the lower limit, median, and upper limit of electricity saving performance for each application type were extracted. These three statistical values served as the foundational benchmarks for the Inefficient, Typical, and Efficient scenarios, respectively. The threshold for the Efficient scenario in 2060 was established based on existing studies80,81,82,83,84,85,86,87,88,89,90,91,92. Subsequently, the benchmarks corresponding to the Efficient and Typical scenarios were adopted as the thresholds for the Typical and Inefficient scenarios in 2060, respectively. Finally, integrating these benchmarks with the technology maturity curve, the electricity saving rates for each scenario and each electricity saving measure were projected from 2025 to 2060. Detailed results are provided in Supplementary Table 1.