The monthly burn area from MODIS (MCD64A1) was used from 2002 to 2023. We disaggregated burn area (BA) in forest and woodland areas (tree cover >20%) from the nonforest areas (tree cover 48,49. Though there are higher resolution fire data (e.g., MTBS50), we opted to use MODIS data as these provide wall-to-wall coverage in the US and Canada from a single consistent source. However, MODIS does not distinguish between wildfires and other types of biomass burning. Hence, in parts of the central and eastern US much of the BA is due to prescribed fire or agricultural burning rather than wildfire, thus likely confounding climate-fire relationships51.
The Fire Weather Index (FWI) is an indicator of potential fire intensity calculated from meteorological data that accounts for the fuel dryness and short-term fire weather. Two climate predictors are used based on past studies that show that interannual variability in macroscale BA is influenced by both antecedent fine-fuel build-up in the prior year and fire weather conditions during the fire season23. First, we used antecedent precipitation from January-August during the prior year as a proxy for fuel biomass accumulation to account for moisture that influences vegetation productivity during the previous growing season22. Secondly, rather than apply a fixed seasonal window for contemporary fire weather, we calculate the annual maximum of 90-day average FWI for each pixel52. We chose a 90-day window for average FWI that is flexible across years and ecoregions rather than a prescribed window (e.g., Jun–Sep) as temporally varying windows have higher correlative power to burned area48 and they account for the geographic variations in core fire season across the study region. Daily meteorological data were sourced from ERA5 at a 0.25° horizontal resolution. FWI was calculated using daily maximum temperature, daily minimum relative humidity, daily accumulated precipitation, and daily mean wind speed per prior analyses23.
Counterfactual climate data were developed using a simple approach that removes the first-order influence of monthly temperature, precipitation, humidity, and winds from the observed record1,53. This approach preserves the temporal variability of the observed record, but removes a low-pass filtered signal from climate models. While climate variability may change due to anthropogenic forcing54, we focus on the stronger, more robust, and direct influence of climate change on means. Our approach follows previous work in developing counterfactuals that adjust for first-order influence of climate change. We suggest that potential change in climate variability is a topic for future work. We examined the climate change signal using twenty climate models participating in CMIP6 (Table S1) for the historical (1850–2014) and future (SSP2-45, 2015–2100) forcing. Specifically, we calculate a low-pass filtered signal for the 20-model mean for each location and month to the pre-industrial baseline (1850–1900). These low-pass climate change signals were subtracted from the observed (2000–2023) data to get a single counterfactual void of the first-order climate change influence following prior studies1. These data are then used to calculate a counterfactual FWI and precipitation.
All climate and BA data were aggregated to EPA level II ecoregions, i.e. assemblages of common vegetation and climate55. We conducted the BA modeling at ecoregion scales as such regions are large enough to capture top-down climate drivers of BA as seen in prior studies22,56.
Separate empirical models were developed for each ecoregion for annual forest BA and annual nonforest BA. These models take the simple log-linear approach used in prior studies22 (Eq. (1)):
$${BA}(t)=\alpha +\beta {{FWI}(t)}_{90d}+\gamma P(t-1)+\varepsilon$$
where FWI is the maximum 90-day mean FWI during the year, P is the January-September year precipitation from the previous year, and ε represents an error term. The error term ε is randomly pulled from the residual of observed minus modeled without the error term and is incorporated to account for stochastic factors not included in the simple model27. Notably, this term is needed to account for underestimates in BA given the log-linear framework and stochastic factors not included in the model. We use these models to simulate both observed and counterfactual BA. For the latter, we substitute the observed FWI and P for counterfactual versions. Ecoregions that recorded 2 BA for ≥ 25% of years were not modeled due to insufficient burn area and their counterfactuals were set to observed BA. Except for small ecoregions in southern Mexico, ecoregions that were left unchanged had extremely low rates of annual burned fraction (−1). Unchanged ecoregions accounted for
Lastly, we use results from these models to produce counterfactual BA estimates for forest and nonforest lands at a 0.25° spatial resolution (23.5 by 23.5 km) and monthly scale. We first calculate the ratio of modeled counterfactual to modeled observed data from equation 1 for each year, separately for forest and nonforest BA by ecoregion. We spatially and temporally disaggregate counterfactual BA for each ecoregion by multiplying this ratio by the MODIS BA observations. This approach thereby preserves the spatial and temporal variability in BA between observed and counterfactual simulations, however, it does not capture any shifts in BA seasonality.
Wildfire PM2.5 attributable to climate change
Annual mean PM2.5 concentrations are estimated using a machine learning approach that incorporates a basic model of atmospheric flow and varying sources of wildfire smoke. The model is trained using published daily data11. aggregated into annual averages on a 2.5o grid (277.5 by 277.5 km). Once trained, the model is applied to the real and counterfactual burn area datasets, one grid location at a time, to obtain gridded smoke estimates. The proportional difference between these two datasets (observed and counterfactual) is then obtained and used to scale the observed data during this period to determine the final counterfactual estimate. See (Fig. 6) for a schematic of the model described in this section.
A weighted sum of annual burn area within a reachable neighborhood is used as model input in each grid cell. The burn area in forested and nonforested areas is treated as independent inputs. Elevation data by GEBCO57 are used with a shortest path algorithm to determine dynamically reasonable regions from which smoke may propagate (for example, deep valleys may influence the horizontal flow of wildfire smoke). These neighborhoods are described by the spatial weights which are calculated using a two-step process:
1.
Distance weighting. Weights decrease with distance, using a 2-dimensional Gaussian curve. This means that wildfires at greater distances contribute less to local smoke concentrations. The Gaussian function takes the form (Equation 2):
$${e}^{-{D}^{2}/2{\sigma }^{2}}$$
Where D is distance in km and is the standard deviation (i.e., the parameter which controls the width of the Gaussian surface). An iterative validation of the model showed that D = 300 km provides good RMSE and correlations across the analysis domain.
2.
Connected regions. A second set of pairwise distances is calculated using the Dijkstra shortest path algorithm58. To apply the algorithm, the gridded locations are converted into a mathematical network, where the centers of cells are nodes and edges are constructed by connecting 8 neighboring nodes. The edges are assigned weights according to the distance between their corresponding nodes. Nodes are not connected where some elevation difference is exceeded. The algorithm is applied three times, each for elevation differences of 500 m, 1000 m and 1500 m and the three resulting distance arrays are then averaged to obtain the final weights (Supplementary Fig. 5).
Two machine learning algorithms are combined using a voting regressor: XGBoost and Support Vector Regression. The XGBoost algorithm (gradient boosting) obtains monotonic and non-linear relationships between the inputs and annual mean wildfire smoke. Incorporating support vector regression into the framework allows for a smoother output and improved extrapolation/interpolation in the regions of the parameter space lacking observations. The model is trained to make predictions generalized to the whole study grid by concatenating input data before training. Hyperparameters for both models are calibrated using a random search with 5000 iterations and a fivefold cross validation to minimize the root mean square error. Validation shows a correlation of 74% between observed and modeled (out-of-sample) PM2.5 values.
The smoke model performs well, is computationally quick to run, and incorporates a simple atmospheric flow model. However, there is scope for improvement. A more complex smoke propagation and dispersion model may better model more complex seasonally changing flows. For this study, we investigated the use of local wind vectors from a reanalysis dataset but found no improvement in the model output. However, using a fully dynamic numerical model such as HYSPLIT to determine neighborhood weights could provide better results, at the cost of computational resources. A more complex atmospheric model could also allow the model to better represent wildfire effects from further afield, beyond the 500 km box used in this study. We found expanding this box in our model is detrimental to the output, likely due to the aggregated complexity of the flow over these distances.
Although not explicitly included in this model, the transport of wildfire smoke can be sensitive to the injection height plume, which is impacted by burn temperature and vegetation characteristics. By separating burned area by forested and nonforested, some of this difference will be captured in a statistical sense. However, future versions of this model should include more variability in vegetation characteristics and a more explicit inclusion of injection height.
Wildfire PM2.5 impacts on human mortality and economic burden
To estimate the contribution of climate change to annual wildfire smoke-related mortality and the economic valuation, we use the US Environmental Protection Agency’s Environmental Benefits Mapping and Analysis Program (BenMAP-CE)21 that incorporates a health impact function derived from epidemiologic literature, baseline demographic and health data, and the Value of Statistical Life (VSL).
For human health impacts, we quantify the number of wildfire PM2.5 related deaths using:
$$\Delta Y=\left(1-{e}^{-\beta * \Delta {PM}{2.5}_{{ij}}}\right)* Y{o}_{j}* {{Pop}}_{{ij}}$$
where ∆Y is the change in mortality attributable to the contribution of climate change to wildfire PM2.5, β is the risk coefficient, ∆ PM2.5ij is the difference between the counterfactual (without climate change) and the observed concentrations for annual mean wildfire PM2.5 in county i in year j (2006, 2007,…,2020), Yoj is the baseline mortality rate in year j, and Popij is the number of residents in county i in year j. Baseline rates for county level all-cause mortality are obtained from the Centers for Disease Control Wonder database.
A limited number of epidemiologic studies have attempted to examine longer duration wildfire smoke exposure and mortality. It is well recognized that longer duration smoke exposures are spatially and temporally dynamic, varying in intensity, frequency, and duration40 within and across years. Because of the small evidence base of studies examining longer duration smoke exposures and overall uncertainties in the relationship with mortality, within this analysis, we assume the relationships between annual ambient PM2.5 and wildfire PM2.5 exposure and mortality are similar. This assumption is based on the well documented relationship between long term (i.e., annual average) ambient PM2.5 exposure and mortality14,59. Therefore, we selected a published pooled risk estimate for long term ambient PM2.5 mortality60. to represent the relationship between wildfire PM2.5 and mortality and derive our concentration response function. The study used a random effects pooling technique to combine hazard ratios from 8 US, Canadian, and European cohorts that included adults 65 years and older. We use the study’s concentration response function60 derived for ages 65+ reflects a similar response function for our study population that includes ages 25–99. Our use of this study conforms to the 2024 US Scientific Advisory Board recommendation to use a single probabilistic mortality estimate based on pooled risk estimates with associated uncertainty ranges.
We estimated economic valuation using the Value of Statistical Life (VSL), a valuation function based on 26 published studies and used by economists, the US EPA in Regulatory Impact Analyses, and US Congress for policy analyses. VSL is the amount society is willing to pay to reduce the risk of premature mortality for one person, estimated at $11.6 million (USD 2024). This value does not represent the value of an individual’s life. We apply a 3% discount rate similar to prior health impact assessments37. Discount rates are used in cost benefit analyses to account for economic benefits that occur over multiple years and reflect the social concept that a health benefit today is more valuable than a health benefit in the future. The estimated dollar value of wildfire related impacts account for mortality alone. This estimate does not account for morbidity impacts associated with exposure to fine particles, which include an array of chronic effects like cerebrovascular events, and acute effects like aggravated asthma. Thus, our dollar value is likely underestimated.