Here we provide a detailed explanation of our weighting framework. We explain in more detail how relevance, quality and diversity weighting can be applied. The final sections are related to our demonstration of the weighting framework to the AR6 database.
Weighting based on scenario relevance
Analysing and re-using scenario data is only sensible once a corresponding research question is defined. A key step in generalized scenario weighting is therefore to define question-specific relevance weights R(i) (equation (1). This relevance-weighting term, R(i), can take several forms depending on the research question. For example, it could be strictly binary, including or excluding a scenario i based on meeting the question-specific condition C:
$$\begin{array}{l}R(i)=\left\{\begin{array}{l}1\,\,\,\mathrm{if}\,C\\ 0\,\,\,\mathrm{else}\end{array}\right.\end{array}$$
(2)
A straightforward example is whether a scenario limits global warming to within certain temperature bounds15. This condition is critical if the intention is to explore characteristics aligned with keeping warming well below 2 °C or 1.5 °C with a specific likelihood15,25,42.
However, binary weighting of scenarios based on temperature outcome is only fully defensible if uncertainty around the climate outcome of a scenario can be unambiguously quantified with a single probability distribution. This is typically never the case43. Under such conditions, R(i) could also take the form of a continuous function. Consider an example with a threshold ϑ provided for a given scenario metric mi. R(i) can be constructed such that scenarios within the threshold are weighted with unity and scenarios beyond the threshold are weighted on the basis of their distance to the threshold, for example, using a stretched exponential function with scaling factor α and stretching exponent β to determine the relevance weight:
$$\begin{array}{l}R(i)=\left\{\begin{array}{l}1\,\,\,\mathrm{if}\,{m}_{i} < \vartheta \\ {{\mathrm{e}}}^{-\left({\left(\alpha \frac{{m}_{i}-\vartheta }{\vartheta }\right)}^{\beta }\right)}\mathrm{else}\end{array}\right.\end{array}$$
(3)
Alternatively, scenarios that are within a defined threshold ϑ, could be weighted on the basis of their distance from it. For example, a user may want adherence to temperature thresholds but may want to apply a risk-based relevance weighting. Here scenarios further from thresholds achieve a higher R(i), as they are deemed to have less risk of breaching the threshold.
Weighting based on scenario quality
The quality of a scenario is central to consider whether to continue using its information for secondary analysis. However, an assessment of quality is subjective and depends on the research question being explored. For example, accurate historical emissions might be important for understanding how emissions relate to global warming targets, while accurate historical energy-system capacities might be important if characteristics of the energy transformation are gleaned from these pathways. Alternatively, quality weighting could be derived from the characteristics of certain models. For example, their representation of specific technologies of interest. Quality weighting could also be assigned using quantifiable feasibility criteria16,44, obtained from literature, expert judgement or elicitation45. Practically, this could mean down-weighting scenarios with technology pathways or societal changes judged by experts to be outside plausible achievability.
Methods that have earlier been applied to account for climate data projection quality20,21,22,23 can be adapted to our context, by comparing emissions and energy trends in scenarios to recent observations. Scenario quality can be accounted for by implementing a continuous weighting Qj(i) between 0 and 1 for each scenario i, based on a set of j distance criteria fj(d) of k quality metrics defined between modelled data for a metric (vk,i) of a scenario and expert assessment of a set of measures of quality (Ek) (equation (4)).
$$Q\left(i\right)=\mathop{\prod }\limits_{j,k}{f}_{j}\left(d\left({v}_{k,i},{E}_{k}\right)\right)$$
(4)
The IPCC SR1.5 and AR6 assessments did not use a formal method of scenario weighting but applied a scenario quality filtering7,8,18,26, with each scenario effectively assigned Q(i) = 0 (excluded) or Q(i) = 1 (included). For example, SR1.5 excluded scenarios with negative CO2 emissions from agriculture, forestry and other land-use (AFOLU) in 2020 because of the perceived implausibility of AFOLU CO2 emissions becoming negative within 2 years of the publication of the report7,18. In addition, SR1.5 excluded a further 30 scenarios from its analysis of global and sectorial emissions evolutions (Table 2.4 and Fig. SPM.3 in ref. 5) for having GHG emissions in 2010 outside the range of historical estimates. Other parts of the report that explore scenario dimensions orthogonal to global and sectorial emissions evolutions, include all scenarios. In both above-mentioned cases, fj(d) would take a functional form that returns 1 when a scenario variable in a given year falls inside a set range and zero when not.
Weighting based on scenario diversity
Scenario diversity provides a metric of distance or proximity between scenarios. It requires expert judgement and a clear research question to identify relevant variables to consider. For example, to explore emissions statistics consistent with limiting global warming to a specific level, considering the diversity in emission evolutions of scenarios could be sufficient. For questions about possible energy-system configurations compatible with a specific climate goal, another set of variables could be selected for determining that diversity.
We use an adapted version of the method to estimate effective repetitions in Earth system model projections20,21 that uses a Gaussian function:
$$D\left(i\right)=\left(1+\mathop{\sum }\limits_{{i}^{{\prime} }\ne i}\exp \left(-\frac{{S}_{i{i}^{{\prime} }}^{2}}{{\sigma }_{S}^{2}}\right)\right)$$
(5)
where Sii′ is a similarity distance metric between two scenarios i and i′ for a single variable and is taken as a root-mean-square difference between two time series; and σS is the ‘radius of scenario similarity’21. Parameter σS defines how close models need to be to be effectively down-weighted and is an assessment choice. If two scenarios i and i′ are exactly the same, Sii′ = 0 and D(i) = 2, weighting each model as 1/2. Equation (5) can be generalized to compare scenario similarity across N variables:
$${D}_{n}\left(i\right)=1+\mathop{\sum }\limits_{n=1}^{N}\mathop{\sum }\limits_{{i}^{{\prime} }\ne i}{b}_{n}\exp \left(-\frac{{S}_{i{i}^{{\prime} },n}^{2}}{{\sigma }_{S,n}^{2}}\right)$$
(6)
where bn are relative contributions of each variable to the total diversity weighting that sum to 1 across all variables n. Besides selecting σS,n for each variable n, equation (6) also requires the selection of the appropriate constants bn. The simplest case could be to set bn = 1/N, but some variables may be more important than others in relative terms.
Other ways of defining D(i) could also draw on an analysis of the underlying energy model fingerprint of mitigation scenarios46 or alternative ways of conceptualizing distance, for example, through principal component analysis47.
Dependencies between weighting applications
Our method uses several weights which can interact with one another. For instance, the definition and calculation of quality Q(i) and diversity D(i) weights is informed by the scope of relevance weights R(i) being applied. Further, quality and relevance can be multidimensional and individual dimensions can be weighted differently. As a result, a set of final scenario weights and the analysis outcomes of the weighted dataset may change based on the scope of the research question. Our weighting method allows this dependency to be communicated in a structured and transparent way.
Consider two analyses of interest. The first interrogates a scenario dataset based on temperature outcomes. In this case, Q(i) may comprise weights based on proximity to historical emissions inventories. However, for a research question focused on investments needed to achieve a given temperature limit, Q(i)′ would additionally weight scenarios on the basis of their historical investment and near-term investment outcomes. In this case, Q(i) differs from Q(i)′ and secondary analysis outcomes will reflect these different research demands.
Application of weighting to AR6 scenarios
Here we provide the weighting specifications for the illustrative application of the framework to the IPCC AR6 database10. Integrating expressions for all weighting components, the application-specific scenario-weighting equation becomes:
$${\mathrm{gw}}_{i}=\frac{R(i)Q(i)}{D(i)}=R(i)Q(i)/\left(1+\mathop{\sum }\limits_{n=1}^{N}\mathop{\sum }\limits_{{i}^{{\prime} }\ne i}{b}_{n}\exp \left(-\frac{{S}_{i{i}^{{\prime} },n}^{2}}{{\sigma }_{S,n}^{2}}\right)\right)$$
(7)
Relevance weighting
For analysis of scenarios that limit global warming to 1.5 °C with no or limited overshoot (IPCC AR6 category C1), or scenarios that return warming to 1.5 °C after a high overshoot (IPCC AR6 category C2), respectively, we apply binary question-specific relevance weights R(i) that select scenarios with the specific global warming characteristics as defined in IPCC AR66.
$$\begin{array}{l}{R}_{{\rm{C}}1}(i)=\left\{\begin{array}{l}\begin{array}{cc}1, & \,\,P({\rm{d}}{T}_{21\mathrm{st}\,\mathrm{century}} > {1.5}^{\circ }{\rm{C}})\le 67{\rm{ \% }}\wedge P({\rm{d}}{T}_{2100} < {1.5}^{\circ }{\rm{C}}) > 50{\rm{ \% }}\end{array}\\ \begin{array}{cc}0, & \mathrm{else}\end{array}\end{array}\right.\end{array}$$
(8)
And separately:
$$\begin{array}{l}{R}_{{\rm{C}}2}(i)=\left\{\begin{array}{l}\begin{array}{cc}1, & \,\,P({\rm{d}}{T}_{21\mathrm{st}\,\mathrm{century}} > {1.5}^{\circ }{\rm{C}}) > 67{\rm{ \% }}\wedge P({\rm{d}}{T}_{2100} < {1.5}^{\circ }{\rm{C}}) > 50{\rm{ \% }}\end{array}\\ \begin{array}{cc}0, & \mathrm{else}\end{array}\end{array}\right.\begin{array}{c}\\ \end{array}\end{array}$$
(9)
C1a scenarios are a subset of C1 scenarios in which net-GHG emissions reach zero levels in the second half of the century.
Quality weighting
We test a continuous quality weighting Q(i), adapted from the IPCC AR6 procedures and reported in Table 11 in ref. 26 (reproduced in Supplementary Table 3). For each quality criterion, j, that specified a range, we use the distance of the modelled value, k, for each scenario i from the reference value, dk,i, normalizing by the IQR of the scenarios
$$\widetilde{{d}_{k,i}}=\frac{{d}_{k,i}}{\mathrm{IQR}({d}_{k})}$$
(10)
We then use a Gaussian function to produce the continuous weight, w, for each criterion, j:
$${w}_{j}\left(i\right)=\exp \left[-{\left(\widetilde{{d}_{k,i}}\right)}^{2}\right]$$
(11)
This yields a weight of 1 when there is perfect agreement with the reference value. We treat each criterion evenly and combining to give a single quality weighting:
$$Q\left(i\right)=\mathop{\sum }\limits_{j}{w}_{j}\left(i\right)$$
(12)
Diversity weighting
Diversity weighting D(i) accounts for variations in 15 variables across four key dimensions (emissions, economy, mitigation strategy and energy; Table 1). These 15 variables are part of the minimum data requirement for a scenario to be considered as part of the IPCC AR6 ensemble and therefore available for each scenario. The illustrative variable weights bn for each variable are a subjective choice and here chosen to put equal weights on each of the four key dimensions. Subweighting of the variables in each group attempts to limit the overall influence of variables that are similar in nature and could bias the total, or puts weight on more important variables. For example, in the energy category, the six primary energy variables combined are worth the same as the single variable of final energy. In the emissions category, the aggregate weighting of all non-CO2 emissions variables equals that of CO2, recognizing that CO2 is the primary anthropogenic driver of climate change48.
Although not applicable here, scenarios that do not report all variables can also be included by rescaling coefficients bn for the variables that are available. This rescaling is first achieved within each variable group or across all remaining groups if one variable group would end up having no reported variables. The sum of bn is always unity.
Acknowledging the subjectivity in variable weighting, we also explored using only energy variables, as well only emissions variables from our selection (Supplementary Results 2). To account for correlation between variables, we adjusted our variable weights using a correlation matrix and hierarchical clustering (Supplementary Methods 1). This results in a set of ‘correlation-adjusted’ variable weights, with a reduced number of eight variables used following correlation assessment and clustering. We adopt an even weighting across the variables as we preserve emphasis on originally higher weighted variables in selection of our representative cluster variables (Table 1 and Supplementary Methods 1). We present our correlation-adjusted weights in the main text but also show results from our full set of variables in Supplementary Results 2.
Our conceptual choice of σS,n (radius of similarity) between two scenarios is guided by expert judgement and should be seen as an informed yet illustrative choice rather than a conclusive one. We define σS,n as the root-mean-square difference between different IAMs running the same shared socioeconomic pathways and representative concentration pathways (SSP–RCP) combination available from ref. 9 using 10-year timesteps from 2020 to 2100. This choice is made on the basis that two independent IAMs running the same scenario are structurally different yet subject to a set of common constraints (exogenously provided population and gross domestic product (GDP) projections and year-2100 radiative forcing). We explore a range of values between the minimum and maximum of the SSP–RCP model differences, selecting a value that achieves a high median spread in diversity weights across our variables, while ensuring a relatively even impact across all variables (Supplementary Results 2). For no-policy, no-mitigation or business-as-usual scenarios, we remove the mitigation group variables from determining the distances of σS,n as these are zero by definition in these scenarios.
Calculation of weighted quantiles
Quantiles are calculated without interpolation. Reported quantile values are the lowest scenario values equal to or above the quantile. When applying to weighted distributions, quantiles are calculated using the same approach, but rather than scenarios having an equal weight in the distribution, scenarios contribute according to its assigned weight. Therefore, the cumulative sum of weights is used to identify the quantile threshold.
Calculation of Herfindalh–Hirschman index
To quantify changes in model or project concentration in our unweighted and weighted ensembles, we use the HHI33. This has historically been used for measuring market concentration in economics but can be applied more broadly to a dataset to understand dominance of components. It is defined by the sum of squared shares of each component:
$${\rm{HHI}}=\displaystyle \mathop{\sum }\limits_{i=1}^{N}{c}_{i}^{2}$$
(13)
Where ci is the component i in relation to the total. A higher HHI value indicates increased dominance of single components.