{"id":464557,"date":"2025-10-01T01:51:43","date_gmt":"2025-10-01T01:51:43","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/464557\/"},"modified":"2025-10-01T01:51:43","modified_gmt":"2025-10-01T01:51:43","slug":"disaggregated-municipal-energy-consumption-and-emissions-in-end-use-sectors-in-germany-and-spain-for-2022","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/464557\/","title":{"rendered":"Disaggregated Municipal Energy Consumption and Emissions in End-use Sectors in Germany and Spain for 2022"},"content":{"rendered":"<p>The spatial disaggregation of FEC and emissions data is carried out in four steps, as illustrated in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig1\" target=\"_blank\" rel=\"noopener\">1<\/a>. The following sub-sections describe each step in detail.<\/p>\n<p><b id=\"Fig1\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 1<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig1_HTML.png\" alt=\"figure 1\" loading=\"lazy\" width=\"685\" height=\"608\"\/><\/a><\/p>\n<p>The steps involved in the spatial disaggregation of emissions and Final Energy Consumption (FEC) data from country (NUTS0) to municipal (LAU) level.<\/p>\n<p>Data collection<\/p>\n<p>For the disaggregation work, this study utilizes three types of data: (i) FEC data for various sub-sectors (ii) emissions data for various sub-sectors and (iii) proxy data relevant to each sub-sector, which facilitates the disaggregation of FEC and emissions data.<\/p>\n<p>FEC and emissions data are collected at the national level, whereas proxy datasets are collected at various sub-national spatial resolutions. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig2\" target=\"_blank\" rel=\"noopener\">2<\/a> illustrates the Nomenclature of Territorial Units for Statistics (NUTS) spatial hierarchy in Germany and Spain. Beginning at the national level (NUTS0), the hierarchy increases in spatial resolution through successive subdivisions: NUTS1 (federal states), NUTS2 (provinces), and NUTS3 (districts). Below NUTS3, municipalities \u2014referred to as Local Administrative Units (LAUs) \u2014represent the most granular level of administrative geography in both countries. The size of LAUs varies significantly. In Germany, the smallest LAU is Insel L\u00fctje H\u00f6rn, covering just 0.096 km2, while the largest is Berlin at 891.80 km2. In Spain, Emperador is the smallest LAU (0.026 km2) and C\u00e1ceres the largest (1,750.26 km2).<\/p>\n<p><b id=\"Fig2\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 2<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/2\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig2\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig2_HTML.png\" alt=\"figure 2\" loading=\"lazy\" width=\"685\" height=\"611\"\/><\/a><\/p>\n<p>The spatial hierarchy in Germany and Spain, showing the availability of various proxy datasets from public data sources at different spatial levels. The data sources highlighted in orange and blue provide data only for Germany and Spian, respectively. Proxy data undergoes a step-wise spatial disaggregation to achieve final proxies at the LAU level. Emissions and FEC data, available at the NUTS0 level from Eurostat, is then disaggregated to LAU based on these final proxies.<\/p>\n<p>The details regarding the collection of each dataset are discussed in the following sections.<\/p>\n<p>Final energy consumption<\/p>\n<p>The FEC data, reported at the national level, is imported from the energy balance sheet published on Eurostat<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Eurostat. Complete energy balances. Publication Office of the European Union, &#010;                  https:\/\/doi.org\/10.2908\/NRG_BAL_C&#010;                  &#010;                 (2024a).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR15\" id=\"ref-link-section-d88814951e631\" target=\"_blank\" rel=\"noopener\">15<\/a>, for the year 2022. While the FEC data for both energy and non-energy use is reported on Eurostat, only the energy use FEC is considered here. The breakdown of the end-use sectors for emissions reporting on Eurostat is shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig3\" target=\"_blank\" rel=\"noopener\">3<\/a>.<\/p>\n<p><b id=\"Fig3\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 3<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/3\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig3\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig3_HTML.png\" alt=\"figure 3\" loading=\"lazy\" width=\"685\" height=\"508\"\/><\/a><\/p>\n<p>Breakdown of end-use FEC sectors as reported in Eurostat, with Germany at the top and Spain at the bottom.<\/p>\n<p>The industry sector is broken down into energy-intensive and non-energy-intensive industries. Energy-intensive industries include iron and steel, chemical and petrochemical, non-ferrous metals, non-metallic minerals, mining and quarrying, paper, pulp, and printing, and wood and wood products manufacturing industries<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"IEA. Energy technology transitions for industry: strategies for the next industrial revolution. OECD Publishing (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR16\" id=\"ref-link-section-d88814951e658\" target=\"_blank\" rel=\"noopener\">16<\/a>. Non-energy-intensive industries include transport equipment, machinery, food, beverages, and tobacco, textile and leather manufacturing industries, construction, and other industries that are not specified elsewhere<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"IEA. Energy technology transitions for industry: strategies for the next industrial revolution. OECD Publishing (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR16\" id=\"ref-link-section-d88814951e662\" target=\"_blank\" rel=\"noopener\">16<\/a>. A similar categorisation is provided by Eurostat.<\/p>\n<p>The transport sector is categorized into rail, road, domestic aviation, and domestic navigation. Additionally, the commerce and agriculture and forestry sectors are included, though they are not further broken down.<\/p>\n<p>Greenhouse gas emissions<\/p>\n<p>The GHG emissions data used in this study is sourced from Eurostat<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Eurostat. Greenhouse gas emissions by source sector. Publication Office of the European Union, &#010;                  https:\/\/doi.org\/10.2908\/ENV_AIR_GGE&#010;                  &#010;                 (2024b).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR17\" id=\"ref-link-section-d88814951e678\" target=\"_blank\" rel=\"noopener\">17<\/a>, for the year 2022. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig4\" target=\"_blank\" rel=\"noopener\">4<\/a> illustrates how end-use sectors are categorized for emissions reporting by Eurostat. Compared to the FEC sector classification, the industry sector emissions data provided by Eurostat is less detailed. Notably, emissions from the chemical industry are not reported for Germany, and as such, these emissions are not disaggregated in this study.<\/p>\n<p><b id=\"Fig4\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 4<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/4\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig4\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig4_HTML.png\" alt=\"figure 4\" loading=\"lazy\" width=\"685\" height=\"509\"\/><\/a><\/p>\n<p>Breakdown of end-use emission sectors as reported in Eurostat, with Germany at the top and Spain at the bottom. Note: Emissions from the chemical industry are not reported for Germany on Eurostat and are therefore absent from the figure. Consequently, emissions from energy-intensive industries appear lower than those from non-energy-intensive industries.<\/p>\n<p>In contrast, the transport sector is more granular in the Eurostat data compared to the FEC categorisation. Here, sub-sectors such as rail, road, domestic aviation, and domestic navigation, are further broken down into categories like light-duty trucks, heavy-duty trucks and buses, cars, and motorcycles. For the purposes of this study, emissions from light-duty trucks and heavy-duty trucks and buses are grouped under freight transport, as specific proxies for these vehicle types are not available in publicly accessible datasets.<\/p>\n<p>It is important to highlight that GHG emissions in the sectors examined here primarily stem from fuel combustion. Process-related emissions in the industrial sector are excluded, with one exception: the agriculture sector, which includes non-combustion emissions. Agricultural emissions are considered under two categories-livestock and cultivation. Although Eurostat provides further disaggregation into subcategories such as enteric fermentation, manure management, agricultural soil management, and crop residue burning, only livestock and cultivation are included in this study due to the absence of matching proxies in open-source data.<\/p>\n<p>Proxy data<\/p>\n<p>Proxy data serves as the foundation for spatially disaggregating FEC and emissions across different end-use sectors. Unlike the FEC and emissions data, which is collected for the year 2022, proxy data spans multiple years, as certain datasets \u2014such as land use and land cover \u2014are not updated annually. However, all proxy datasets used represent the most recent year for which data was available at the time of collection.<\/p>\n<p>The selection of proxy data was guided by the following criteria: <\/p>\n<ul class=\"u-list-style-bullet\">\n<li>\n<p><b>Availability in open databases:<\/b> The data must be accessible through publicly available databases to enhance transparency and reproducibility.<\/p>\n<\/li>\n<li>\n<p><b>Sub-national resolution:<\/b> The data should be available at a sub-national scale, such as NUTS1, NUTS2, NUTS3, or LAU, to facilitate the disaggregation of emissions and FEC data reported at NUTS0.<\/p>\n<\/li>\n<li>\n<p><b>Relevance to end-use sectors:<\/b> The selected data must be relevant to at least one of the end-use sectors analyzed in this study. For instance, population data can be used to disaggregate household-related emissions and FEC. Heating degree days help refine the spatial distribution of FEC by accounting for higher heat demand in colder regions. In addition, industrial locations and employment data provide insights into the spatial distribution of industrial emissions and FEC. Furthermore, vehicle stock data supports the disaggregation of road transport emissions and energy consumption. Finally, land use and land cover classifications (e.g., rice fields, vineyards) assist in distributing cultivation-related FEC and emissions.<\/p>\n<\/li>\n<li>\n<p><b>Data completeness:<\/b> The dataset should have minimal missing values to ensure a robust analysis. Here, less than 20% missing data is preferred.<\/p>\n<\/li>\n<\/ul>\n<p>The proxies are obtained from various publicly available databases, including Eurostat, Corine Land Cover<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Copernicus Land Monitoring Service. Corine land cover 2018 (vector\/raster 100 m), europe, 6-yearly. Publication Office of the European Union, &#010;                  https:\/\/doi.org\/10.2909\/960998c1-1870-4e82-8051-6485205ebbac&#010;                  &#010;                 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR18\" id=\"ref-link-section-d88814951e756\" target=\"_blank\" rel=\"noopener\">18<\/a> and OpenStreetMap. A comprehensive overview of all data sources is provided in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig2\" target=\"_blank\" rel=\"noopener\">2<\/a>. The following sub-sections provide an overview of the collected proxy data at different spatial levels, beginning with the LAU level.<\/p>\n<p><b>Proxy data at LAU level<\/b>. The datasets collected at the LAU level for Germany and Spain are summarized in Tables\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab1\" target=\"_blank\" rel=\"noopener\">1<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab2\" target=\"_blank\" rel=\"noopener\">2<\/a>. These include general demographic and geographic statistics, such as population and area, sourced from Eurostat, which are directly available for each LAU region. However, not all relevant datasets are directly available at the LAU level. In such cases, fine-scale gridded or vector datasets are spatially overlaid with LAU geometries to derive region-specific aggregates.<\/p>\n<p><b id=\"Tab1\" data-test=\"table-caption\">Table 1 LAU-level proxy data collected from Eurostat, Hotmaps, European Environmental Agency, and The National Statistics Institute of Spain.<\/b><b id=\"Tab2\" data-test=\"table-caption\">Table 2 LAU-level proxy data collected from Corine Land Cover, Eurogeographics, and OpenStreetMap.<\/b><\/p>\n<p>For example, land use and land cover information, including classes such as continuous urban fabric, is obtained from the Corine Land Cover database. This source provides raster data at a spatial resolution of 100 square meters, which allows for accurate spatial aggregation of land cover types within each LAU boundary. Similarly, air pollution data is sourced from the European Environment Agency<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"EEA. European air quality data, (interpolated data). EEA Datahub, 938bea70-07fc-47e9-8559-8a09f7f92494 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR19\" id=\"ref-link-section-d88814951e1756\" target=\"_blank\" rel=\"noopener\">19<\/a>, available as gridded data at a resolution of 1 square kilometer.<\/p>\n<p>In addition to raster sources, vector datasets are also used. The railway network data is obtained from EuroGeographics, while OpenStreetMap provides detailed information on road networks and building counts. These vector datasets are intersected with LAU geometries to extract relevant spatial indicators for each municipality.<\/p>\n<p>Data on industrial sites was available in three databases: sEEnergies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 20\" title=\"Fleiter, T. Documentation on excess heat potentials of industrial sites including open data file with selected potentials (version 2). Zenodo, &#010;                  https:\/\/doi.org\/10.5281\/zenodo.4785411&#010;                  &#010;                 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR20\" id=\"ref-link-section-d88814951e1766\" target=\"_blank\" rel=\"noopener\">20<\/a>, Global Steel Plant Tracker<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"GSTP contributors. Global Steel Plant Tracker - Global Energy Monitor, &#010;                  https:\/\/globalenergymonitor.org\/projects\/global-steel-plant-tracker\/&#010;                  &#010;                . [Online; accessed 13-August-2024] (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR21\" id=\"ref-link-section-d88814951e1770\" target=\"_blank\" rel=\"noopener\">21<\/a>, and Hotmaps. To determine the most suitable source, the datasets were compared for their level of detail and coverage. <\/p>\n<ul class=\"u-list-style-bullet\">\n<li>\n<p><b>sEEnergies<\/b> provides industrial site locations along with fuel and electricity demand information for industries such as iron and steel, chemicals, non-ferrous metals, non-metallic minerals, paper and printing, and refineries.<\/p>\n<\/li>\n<li>\n<p><b>Global Steel Plant Tracker<\/b> focuses solely on iron and steel plant locations, annotating them with energy demand and employment data, though many sites lack complete information.<\/p>\n<\/li>\n<li>\n<p><b>Hotmaps<\/b> includes locations for cement and glass industries in addition to those covered by sEEnergies, with emissions data provided for each site, though data is missing for many locations.<\/p>\n<\/li>\n<\/ul>\n<p>A comparative analysis was performed to select the most comprehensive source for each sector. For the iron and steel industry, a comparison of site counts across the three datasets was conducted, as illustrated in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig5\" target=\"_blank\" rel=\"noopener\">5<\/a>. For other industries, a comparison between sEEnergies and Hotmaps was performed (see Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab3\" target=\"_blank\" rel=\"noopener\">3<\/a>). Hotmaps reports a higher number of industrial sites across most categories, except for paper and printing industries in Germany, where sEEnergies provides higher counts. Based on this analysis, the Hotmaps database was selected as the primary source for obtaining LAU-level industry data, ensuring comprehensive coverage and consistency across sub-sectors.<\/p>\n<p><b id=\"Fig5\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 5<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/5\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig5\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig5_HTML.png\" alt=\"figure 5\" loading=\"lazy\" width=\"685\" height=\"629\"\/><\/a><\/p>\n<p>The distribution and number of iron and steel industries as reported by three open databases: Global Steel Plant Tracker, Hotmaps, and sEEnergies. The figure highlights the differences in coverage among these sources, with Hotmaps providing the most comprehensive dataset.<\/p>\n<p><b id=\"Tab3\" data-test=\"table-caption\">Table 3 Number of different industries as reported by Hotmaps and sEEnergies open databases.<\/b><\/p>\n<p><b>Proxy data at NUTS3 level<\/b>. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab4\" target=\"_blank\" rel=\"noopener\">4<\/a> provides an overview of the data collected for the German and Spanish NUTS3 regions. Basic statistical information, such as employment and gross domestic product, is published by Eurostat at this spatial level. Heating and cooling degree days and livestock population datasets are available in raster format. With a resolution of approximately 10 square kilometers, these datasets align well with NUTS3 regions, enabling spatial overlap and aggregation at the NUTS3 level. Additionally, some datasets are available only in one of the two countries \u2014for example, sectorally detailed employment data from the Federal Employment Agency<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Federal Employment Agency. Employment database - statistics of the Federal Employment Agency. &#010;                  https:\/\/statistik.arbeitsagentur.de\/DE\/Navigation\/Statistiken\/Interaktive-Statistiken\/Datenbanken\/Datenbanken-BST-Nav.html&#010;                  &#010;                .\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR22\" id=\"ref-link-section-d88814951e2047\" target=\"_blank\" rel=\"noopener\">22<\/a> in Germany and company counts from the National Statistics Institute of Spain<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 23\" title=\"Spanish Statistical Office, &#010;                  https:\/\/www.ine.es\/en\/&#010;                  &#010;                . [Online; accessed 06-Feb-2025] (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR23\" id=\"ref-link-section-d88814951e2051\" target=\"_blank\" rel=\"noopener\">23<\/a>.<\/p>\n<p><b id=\"Tab4\" data-test=\"table-caption\">Table 4 NUTS3-level proxy data collected from different data sources.<\/b><\/p>\n<p><b>Proxy data at NUTS2 level<\/b>. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab5\" target=\"_blank\" rel=\"noopener\">5<\/a> summarizes the data available for the German and Spanish NUTS2 regions. All the datasets collected at this level are sourced from Eurostat.<\/p>\n<p><b id=\"Tab5\" data-test=\"table-caption\">Table 5 NUTS2-level proxy data collected from Eurostat.<\/b>Missing value imputation<\/p>\n<p>The collected proxy datasets exhibit missing values. Imputing these values is a critical step in spatial disaggregation workflows, as complete proxy data is essential for distributing national totals. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab6\" target=\"_blank\" rel=\"noopener\">6<\/a> provides an overview of the number and percentage of missing values identified in the collected proxy data. These gaps are primarily found in datasets that are available only for either Germany or Spain, often due to strict data protection regulations preventing certain regions from reporting data. Consequently, missing values must be imputed using relevant statistical indicators, such as land use and land cover data when estimating the utilized agricultural area.<\/p>\n<p><b id=\"Tab6\" data-test=\"table-caption\">Table 6 Number of missing values per variable with missing values.<\/b><\/p>\n<p>To impute these missing values XGBoost model is employed. Since the spatial distribution process is relative, data quality in one region directly influences the accuracy of all others. Therefore, it is crucial to assess the model\u2019s performance. To this end, we conduct two evaluations: (1) assessing the predictive accuracy within a country by setting aside a portion of the data for validation, and (2) evaluating the model\u2019s predictive capacity in a country where the dataset is entirely absent. The latter is achieved by leveraging available data at intermediate spatial levels, such as states, within the country. The data sources employed in this cross-country missing value imputation evaluation are listed in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig2\" target=\"_blank\" rel=\"noopener\">2<\/a>. The training and validation of the model, as well as the evaluation of the missing value prediction across countries is explained in the following.<\/p>\n<p><b>XGBoost model training<\/b>. The datasets with missing values at the LAU level are imputed by training an XGBoost model using all other LAU-level variables as potential predictors. Before selecting the final predictors, two preprocessing steps are performed to eliminate certain variables: <\/p>\n<ul class=\"u-list-style-bullet\">\n<li>\n<p><b>Removal of non-informative predictors:<\/b> Any predictors that have the same value across all regions are discarded because they lack predictive capability.<\/p>\n<\/li>\n<li>\n<p><b>Correlation analysis:<\/b> A pairwise correlation among all potential predictors is examined. If two variables exhibit an absolute correlation of 0.9 or higher, only one is retained to prevent over-representation of highly similar variables in the model.<\/p>\n<\/li>\n<\/ul>\n<p>The final dataset used as input for the XGBoost model consists of the selected predictors and the variable to be imputed, with only complete records included. Prior to training, 10% of the data is reserved for model validation, while the remaining 90% is utilized for two experimental setups. In the first setup, predictors with an absolute Pearson correlation of at least 0.1 with the variable to be imputed are included. In the second setup, the correlation threshold is increased to 0.5. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig6\" target=\"_blank\" rel=\"noopener\">6<\/a> presents the correlations between \u201cutilized agricultural area\u201d and the various predictors used.<\/p>\n<p><b id=\"Fig6\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 6<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/6\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig6\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig6_HTML.png\" alt=\"figure 6\" loading=\"lazy\" width=\"685\" height=\"437\"\/><\/a><\/p>\n<p>The absolute correlations between utilized agricultural area and different predictors at LAU level. The figure is divided into two sections: the top half displays the least correlated variables, while the bottom half highlights the most correlated ones. For imputing missing values in utilized agricultural area, predictors with correlations of at least 0.1 are used in one set of experiments, while those with correlations of at least 0.5 are considered in another.<\/p>\n<p>In both sets of experiments, hyperparameter tuning is performed using a grid search on the XGBoost model. The hyperparameters tuned include n_estimators, learning_rate, and max_depth, with the model optimized for minimal Root Mean Squared Error (RMSE). To calculate the RMSE, 5-fold cross-validation is applied on the training data, splitting it into five folds. The RMSE is computed for each fold by training the model on four folds and validating on the remaining fold, and the average RMSE across all folds is used as the performance metric for hyperparameter optimization. The final model for data imputation is the one with hyperparameter combination that yields the lowest RMSE.<\/p>\n<p>A similar approach is used in the case of variables with missing values at the NUTS3 level. Here, the potential predictors are all variables at NUTS3 level without missing values, as well as LAU variables with no missing data, aggregated to the NUTS3 level. Figures\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig7\" target=\"_blank\" rel=\"noopener\">7<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig8\" target=\"_blank\" rel=\"noopener\">8<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig9\" target=\"_blank\" rel=\"noopener\">9<\/a>, and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig10\" target=\"_blank\" rel=\"noopener\">10<\/a> illustrate the correlations between the NUTS3 variables with missing values and various predictors.<\/p>\n<p><b id=\"Fig7\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 7<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/7\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig7\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig7_HTML.png\" alt=\"figure 7\" loading=\"lazy\" width=\"685\" height=\"402\"\/><\/a><\/p>\n<p>The absolute correlations between number of commercial and service companies and average daily traffic by light duty vehicles, and different predictors at NUTS3 level. The figure is divided into two sections: the top half displays the least correlated variables, while the bottom half highlights the most correlated ones. For imputing missing values, predictors with correlations of at least 0.1 are used in one set of experiments, while those with correlations of at least 0.5 are considered in another.<\/p>\n<p><b id=\"Fig8\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 8<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/8\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig8\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig8_HTML.png\" alt=\"figure 8\" loading=\"lazy\" width=\"685\" height=\"362\"\/><\/a><\/p>\n<p>The absolute correlations between employment data and different predictors at NUTS3 level. The figure is divided into two sections: the top half displays the least correlated variables, while the bottom half highlights the most correlated ones. For imputing missing values, predictors with correlations of at least 0.1 are used in one set of experiments, while those with correlations of at least 0.5 are considered in another.<\/p>\n<p><b id=\"Fig9\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 9<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/9\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig9\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig9_HTML.png\" alt=\"figure 9\" loading=\"lazy\" width=\"685\" height=\"442\"\/><\/a><\/p>\n<p>The absolute correlations between the number of passenger cars per emission group and different predictors at NUTS3 level. The figure is divided into two sections: the top half displays the least correlated variables, while the bottom half highlights the most correlated ones. For imputing missing values, predictors with correlations of at least 0.1 are used in one set of experiments, while those with correlations of at least 0.5 are considered in another.<\/p>\n<p><b id=\"Fig10\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 10<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/10\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig10\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig10_HTML.png\" alt=\"figure 10\" loading=\"lazy\" width=\"685\" height=\"587\"\/><\/a><\/p>\n<p>The absolute correlations between the building living area and different predictors at NUTS3 level. The figure is divided into two sections: the top half displays the least correlated variables, while the bottom half highlights the most correlated ones. For imputing missing values, predictors with correlations of at least 0.1 are used in one set of experiments, while those with correlations of at least 0.5 are considered in another.<\/p>\n<p><b>XGBoost model validation<\/b>. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab7\" target=\"_blank\" rel=\"noopener\">7<\/a> presents the training and validation errors corresponding to the previously discussed correlation thresholds. While a lower RMSE indicates better model performance, its lack of fixed upper or lower bounds makes accuracy interpretation challenging. Therefore, the R-squared error is also provided, where values closer to 1 signify better performance.<\/p>\n<p><b id=\"Tab7\" data-test=\"table-caption\">Table 7 RMSE and R-squared scores on the training and validation datasets when using the XGBoost model for missing value imputation.<\/b><\/p>\n<p>To ensure transparency regarding the quality of the generated values in this work, a five-level confidence schema is introduced: VERY HIGH, HIGH, MEDIUM, LOW, and VERY LOW. This qualitative labeling system facilitates easier interpretation of data quality compared to conventional statistical measures such as RMSE or R-squared values.<\/p>\n<p>Confidence assignment begins at the data collection stage: all non-missing values are automatically labeled as VERY HIGH. Missing values, in contrast, are assigned one of the remaining confidence levels based on the R-squared score of the chosen imputation model. Specifically, the thresholds are defined as follows: HIGH for\u00a0&gt;0.8, MEDIUM for\u00a0&gt;0.5 and \u22640.8, LOW for\u00a0&gt;0.2 and \u22640.5, and VERY LOW for \u22640.2.<\/p>\n<p>For each variable, between the two experimental settings that are considered \u2014with correlation thresholds of \u22650.1 and \u22650.5 \u2014the configuration yielding the higher R-squared score is selected for imputing missing values. Although the XGBoost model is trained to minimize RMSE, R-squared scores are employed for quality ratings due to their bounded range (\u22641), which provides a consistent and interpretable scale across variables. The final model selected for each variable, along with its corresponding confidence level, is presented in Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab8\" target=\"_blank\" rel=\"noopener\">8<\/a>.<\/p>\n<p><b id=\"Tab8\" data-test=\"table-caption\">Table 8 The value confidence levels assigned based on the R-squared score obtained for the better-performing model between two predictor sets: those with a correlation threshold \u22650.1 and those with a correlation threshold \u22650.5.<\/b><\/p>\n<p>The trained XGBoost models effectively predict missing values for most variables (see Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab8\" target=\"_blank\" rel=\"noopener\">8<\/a>). However, the variables \u201cemployment in the food and beverage manufacturing sector\u201d have LOW prediction quality, and \u201cemployment in textile and leather manufacturing\u201d is classified as VERY LOW. The poor predictions for food and beverage manufacturing can be attributed to the lack of relevant predictor data at the NUTS3 level. In addition to this limitation, the low prediction quality for textile and leather manufacturing is further exacerbated by a higher proportion of missing values, with 34 out of 401 records missing, compared to other datasets at the NUTS3 level. The variable \u201cAverage daily traffic &#8211; light duty vehicles\u201d shows the poorest prediction results. The R-squared values are negative (see Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab7\" target=\"_blank\" rel=\"noopener\">7<\/a>), indicating that none of the predictors contribute meaningfully to the predictions. Out of 52 records, 10 have missing values. Reserving 10% of the remaining data for validation further reduces the number of records available for training a reliable XGBoost model. Therefore, the XGBoost predictions are discarded, and missing values are imputed using the mean of the existing data. Since this approach is not robust, the imputed values are assigned a LOW prediction quality.<\/p>\n<p><b>Evaluation of missing value prediction across countries<\/b>. As previously mentioned, missing values are observed only in the country-level datasets for Germany or Spain. Consequently, the validation using the 10% of data set aside specifically assesses how well missing values can be imputed within regions of the same country. Here, the trained models are applied to predict data for regions in the other country, and the results are analyzed.<\/p>\n<p>For instance, \u201cutilized agricultural area\u201d is available at the LAU level for Spain. A trained model, initially developed to impute missing values, is also applied to predict values for German LAU regions. These predictions are then validated against data from the Federal Statistical Office of Germany<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\" title=\"GENESIS-Online. Die datenbank des statistischen bundesamtes, &#010;                  https:\/\/www-genesis.destatis.de\/genesis\/online&#010;                  &#010;                . [Online; accessed 13-August-2024] (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR24\" id=\"ref-link-section-d88814951e4988\" target=\"_blank\" rel=\"noopener\">24<\/a>, which provides agricultural area figures only at the NUTS1 level. To enable comparison, the predicted values are aggregated accordingly. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig11\" target=\"_blank\" rel=\"noopener\">11<\/a> demonstrates a strong alignment between the predictions and the validation data. This suggests that agricultural land use patterns in Spain and Germany are similar. The distribution of utilized agricultural area in both countries is well explained by highly correlated predictors such as total available area and non-irrigated arable land cover (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig6\" target=\"_blank\" rel=\"noopener\">6<\/a>). Given this successful validation, the XGBoost model is used to impute missing data for German LAU regions.<\/p>\n<p><b id=\"Fig11\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 11<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41597-025-05938-1\/figures\/11\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig11\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/10\/41597_2025_5938_Fig11_HTML.png\" alt=\"figure 11\" loading=\"lazy\" width=\"685\" height=\"348\"\/><\/a><\/p>\n<p>[Top] Results of training an XGBoost model to predict utilized agricultural area in Spain at the LAU level, and applying this model to estimate values for German LAU regions. The predicted data is compared with the available utilized agricultural area data at the NUTS1 level in Germany. The results indicate that the model\u2019s predictions closely align with the actual data, with minimum and maximum deviations of 9.34 and 5106.56 square kilometers, respectively. [Bottom] Results of training an XGBoost model to predict passenger car stock in Germany at the NUTS3 level, and applying this model to estimate values for Spanish NUTS3 regions. The predicted data is compared with the available data for the 3 NUTS3 regions in the Basque Country, Spain. The results indicate that the model\u2019s predictions deviate significantly from the actual data, with minimum and maximum deviations of 108957.0 and 276023.0 cars, respectively.<\/p>\n<p>A similar validation is conducted at the NUTS3 level, where the number of passenger cars per emission group is estimated for the Spanish NUTS3 regions. The data is then aggregated into a single dataset representing the total number of passenger cars per NUTS3 region. A further aggregation to the NUTS2 level is performed, for comparison with data from Eustat<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 25\" title=\"Eustat. Basque statistical institute. &#010;                  https:\/\/en.eustat.eus\/indice.html&#010;                  &#010;                . [Online; accessed 13-August-2024].\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR25\" id=\"ref-link-section-d88814951e5018\" target=\"_blank\" rel=\"noopener\">25<\/a>, which provides the number of cars for the three provinces of the Basque Country. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig11\" target=\"_blank\" rel=\"noopener\">11<\/a> presents this comparison, revealing significant deviations between the predictions and the validation data. A similar pattern is observed in other datasets at the German NUTS3 level. These discrepancies suggest that cross-country imputation is not universally reliable, potentially depending on the spatial level of analysis. The availability of more data at the LAU level provides greater variance, allowing for improved learning, whereas similar attempts at the NUTS3 level yield less accurate results. Additionally, sector-specific factors influence the effectiveness of imputation. For example, agricultural indicators exhibit similar spatial distributions in both countries, making cross-country imputation more feasible, whereas transport-related indicators do not follow the same pattern. Due to these inconsistencies, the predictions are discarded.<\/p>\n<p>Step-wise spatial disaggregation<\/p>\n<p>Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Fig2\" target=\"_blank\" rel=\"noopener\">2<\/a> shows that only some proxy data is readily available at the LAU level, while most statistical data is typically available at NUTS3 or NUTS2 levels. Additionally, most proxy datasets do not cover the Canary Islands in Spain. Due to this limitation, these regions are excluded from the scope of this study.<\/p>\n<p>In this work, we perform a step-wise spatial disaggregation. Initially, proxy data available at the NUTS3 level is disaggregated using LAU proxy data, to achieve finer resolution. Subsequently, NUTS2 proxy data is disaggregated to the LAU level using both the LAU data and the previously disaggregated NUTS3 data as proxies. Finally, the emissions and FEC data are disaggregated to the LAU level. This approach improves the accuracy of the disaggregated data by progressively refining estimates.<\/p>\n<p>Among the various spatial disaggregation approaches found in the literature, proxy data-based and machine learning-based methods are the most suitable for disaggregating emissions and FEC data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Patil, S., Pflugradt, N., Weinand, J. M., Stolten, D. &amp; Kropp, J. A systematic review of spatial disaggregation methods for climate action planning. Energy and AI 17, 100386, &#010;                  https:\/\/doi.org\/10.1016\/j.egyai.2024.100386&#010;                  &#010;                 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR7\" id=\"ref-link-section-d88814951e5042\" target=\"_blank\" rel=\"noopener\">7<\/a>. The proxy data-based approach distributes the target data based on the proportion of the chosen spatial proxy. In contrast, the machine learning-based approach trains a predictive model, such as XGBoost, to learn the relationships between all available proxy data and the target data at the source spatial level (e.g., NUTS0), and then uses this model to predict the target values in each target region.<\/p>\n<p>Initially, a machine learning-based approach for disaggregation was considered. The approach was eventually discarded due to the following reasons: <\/p>\n<ul class=\"u-list-style-bullet\">\n<li>\n<p>The imputation of missing values resulted in poor predictions in certain cases. Applying an additional layer of prediction on top of this may further degrade the results.<\/p>\n<\/li>\n<li>\n<p>In Spain, there are only 52 NUTS3 regions, which may constitute a sample size too small to generate reliable predictions at the LAU level.<\/p>\n<\/li>\n<li>\n<p>Some variable pairs, such as population and gross domestic product, exhibited strong correlations at the NUTS3 level but weaker correlations at the LAU level. These differences in correlation raise concerns about whether the statistical relationships among variables, upon which the predictions are based, remain valid at the LAU level.<\/p>\n<\/li>\n<li>\n<p>For most variables, no validation data is available at the LAU level, making it challenging to assess the performance of this disaggregation approach.<\/p>\n<\/li>\n<\/ul>\n<p>Therefore, in this study, a proxy data-based spatial disaggregation method is employed. Here, the quality of disaggregated data primarily depends on how effectively the chosen proxy captures the spatial distribution of the target data. The selection of a spatial proxy is inherently constrained by the availability of data at fine-scale resolution. For each target dataset, potential proxies are initially identified based on theoretical considerations. If the most suitable proxy is unavailable in open databases with sufficient non-missing values, the closest alternative is selected. For example, in disaggregating employment data for textile and leather manufacturing, the ideal proxy would be the total size of textile and leather manufacturing facilities in each LAU region. If that data is unavailable, the next best option would be the number of such facilities, followed by a broader proxy such as \u201cindustrial or commercial units cover.\u201d Since no data on textile and leather manufacturing facilities is accessible, \u201cindustrial or commercial units cover\u201d is ultimately chosen as the proxy.<\/p>\n<p>To provide transparency regarding the reliability of the disaggregated data, each proxy is assigned a confidence level \u2014classified as HIGH, MEDIUM, LOW, or VERY LOW \u2014to indicate its relevance and explanatory strength with respect to the target data. The confidence level reflects the degree of alignment between the proxy and the target dataset. For instance, in the example above, the total size of textile and leather manufacturing facilities would receive a HIGH confidence rating, the number of such facilities would receive a MEDIUM rating, and \u201cindustrial or commercial units cover\u201d would be rated as LOW.<\/p>\n<p>The confidence level assigned to the final disaggregated values at the LAU level is determined by taking the minimum of the confidence level of the proxy data and that of the proxy assignment. For instance, if a proxy value in a LAU region is of MEDIUM confidence, influenced by the missing value imputation, and the proxy assignment is of LOW confidence, then the disaggregated value will be assigned a LOW confidence. The selection of proxies for the step-wise spatial disaggregation process is outlined in the following.<\/p>\n<p><b>1. NUTS3 variables to LAU<\/b>. Tables\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab9\" target=\"_blank\" rel=\"noopener\">9<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab10\" target=\"_blank\" rel=\"noopener\">10<\/a>, and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab11\" target=\"_blank\" rel=\"noopener\">11<\/a> list the NUTS3 variables together with their potential proxies. For each variable, the most suitable proxy is first identified. If this proxy is available in any public database, it is selected and assigned a HIGH confidence level. If the most suitable proxy is unavailable, the next best alternative is considered, assigning a MEDIUM confidence level, and the process is repeated as necessary until a suitable proxy dataset is obtained for disaggregation.<\/p>\n<p><b id=\"Tab9\" data-test=\"table-caption\">Table 9 The potential proxies for disaggregating each NUTS3 variable, commonly collected for both Germany and Spain.<\/b><b id=\"Tab10\" data-test=\"table-caption\">Table 10 The potential proxies for disaggregating each NUTS3 variable, collected only for Germany.<\/b><b id=\"Tab11\" data-test=\"table-caption\">Table 11 The potential proxies for disaggregating each NUTS3 variable, collected only for Spain.<\/b><\/p>\n<p>It is important to note that some proxies are added although they have different measurement units. For example \u201cconstruction sites cover\u201d and \u201croad network\u201d are expressed in square kilometer and kilometer. To ensure comparability, all variables are first normalized by their maximum values, preserving true zeros while scaling all other values relative to the highest observed value. This normalization allows proxies to be summed without introducing inconsistencies.<\/p>\n<p>The \u201cindustrial or commercial units cover\u201d data is sourced from the Corine Land Cover database, which includes only industrial and commercial units spanning 25 hectares or more. Due to this threshold, many regions have zero values. Since this limitation is consistent across all regions, data imputation was not feasible. Consequently, this proxy is used in conjunction with population data in this study. Furthermore, \u201cemployment in construction\u201d uses \u201cconstruction sites cover\u201d and \u201croad network\u201d as proxies because construction encompasses both building and road construction.<\/p>\n<p><b>2. NUTS2 variables to LAU<\/b>. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab12\" target=\"_blank\" rel=\"noopener\">12<\/a> details the proxy assignment process in the case of the NUTS2 variables \u201cnumber of motorcycles\u201d, \u201cair transport of passengers\u201d, and \u201cair transport of freight\u201d.<\/p>\n<p><b id=\"Tab12\" data-test=\"table-caption\">Table 12 The potential proxies for disaggregating each NUTS2 variable, commonly collected for both Germany and Spain.<\/b><\/p>\n<p><b>3b. FEC (NUTS0 data) to LAU<\/b>. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab13\" target=\"_blank\" rel=\"noopener\">13<\/a> presents the FEC end-use sectors for which final proxies were available in both countries. Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab14\" target=\"_blank\" rel=\"noopener\">14<\/a> lists the FEC end-use sectors for which final proxies were available exclusively for Germany. Here, emissions from passenger car road transport are disaggregated according to the passenger car fleet categorized into different emission groups. These groups define the vehicle emission standards used in Europe<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Wikipedia contributors. European emission standards &#x2014; Wikipedia, The Free Encyclopedia, &#010;                  https:\/\/en.wikipedia.org\/wiki\/European_emission_standards&#010;                  &#010;                . [Online; accessed 13-August-2024] (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR26\" id=\"ref-link-section-d88814951e6125\" target=\"_blank\" rel=\"noopener\">26<\/a>, with each group setting caps on specific air pollutants. The initial emission group, Euro 1, was introduced in July 1992. Over the years, the standards have become increasingly stringent with the introduction of new emission caps. The caps for diesel passenger cars concerning pollutants such as carbon monoxide (CO), hydrocarbons and nitrogen oxides (HC\u00a0+\u00a0NOX), and particulate matter (PM) are detailed in Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab15\" target=\"_blank\" rel=\"noopener\">15<\/a>. For each emission group, the caps for these three pollutants are summed to obtain a weighting factor for the proxies. The passenger car data provides information for emission group 5 but does not differentiate between Euro 5a and 5b. In this case, the more lenient tier, Euro 5a, is considered to assign more emissions to cars in tier 5. The data also includes an emission group labeled \u201cOther.\u201d Due to the lack of additional information from the data source regarding this category, it is treated as the Euro 1 group. Since this data was unavailable for Spain, \u201caverage daily traffic &#8211; light duty vehicles\u201d is used as a proxy. Similarly, Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab16\" target=\"_blank\" rel=\"noopener\">16<\/a> outlines the FEC end-use sectors for which final proxies were available for Spain.<\/p>\n<p><b id=\"Tab13\" data-test=\"table-caption\">Table 13 FEC end-use sectors with final proxies commonly available for both Germany and Spain.<\/b><b id=\"Tab14\" data-test=\"table-caption\">Table 14 FEC end-use sectors with final proxies available for Germany.<\/b><b id=\"Tab15\" data-test=\"table-caption\">Table 15 Emission caps for different air pollutants, per emission group.<\/b><b id=\"Tab16\" data-test=\"table-caption\">Table 16 FEC end-use sectors with final proxies available for Spain.<\/b><\/p>\n<p><b>3a. GHG emissions (NUTS0 data) to LAU<\/b>. Tables\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab17\" target=\"_blank\" rel=\"noopener\">17<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab18\" target=\"_blank\" rel=\"noopener\">18<\/a>, and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab19\" target=\"_blank\" rel=\"noopener\">19<\/a> present the proxy assignments in the case of emissions end-use sectors. The proxies are similar to those used for FEC. The differences arise from a different breakdown of the source sub-sectors.<\/p>\n<p><b id=\"Tab17\" data-test=\"table-caption\">Table 17 GHG emissions end-use sectors with final proxies commonly available for both Germany and Spain.<\/b><\/p>\n<p>In Germany, except for energy-intensive industries, all proxies correspond directly to relevant emission sources. As a result, most emissions end-use sector proxies are classified as having HIGH confidence (see Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab18\" target=\"_blank\" rel=\"noopener\">18<\/a>). In contrast, Spain lacks detailed employment data and spatial data on residential and non-residential areas, limiting the availability of HIGH confidence proxies for several emissions end-use sectors (see Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"table anchor\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#Tab19\" target=\"_blank\" rel=\"noopener\">19<\/a>).<\/p>\n<p><b id=\"Tab18\" data-test=\"table-caption\">Table 18 GHG emissions end-use sectors with final proxies available for Germany.<\/b><b id=\"Tab19\" data-test=\"table-caption\">Table 19 GHG emissions end-use sectors with final proxies available for Spain.<\/b>Data validation<\/p>\n<p>Finally, the spatial disaggregation results are compared with the values reported in local inventories (NetZeroCities<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"NetZeroCities. URL &#010;                  https:\/\/netzerocities.app\/&#010;                  &#010;                . [Online; accessed 25-March-2025].\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR4\" id=\"ref-link-section-d88814951e7751\" target=\"_blank\" rel=\"noopener\">4<\/a>) and an open-source sub-national emissions dataset (EDGAR<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"Crippa, M. et al. Insights into the spatial distribution of global, national, and subnational greenhouse gas emissions in the emissions database for global atmospheric research (edgar v8. 0). Earth System Science Data 16(6), 2811&#x2013;2830 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41597-025-05938-1#ref-CR8\" id=\"ref-link-section-d88814951e7755\" target=\"_blank\" rel=\"noopener\">8<\/a>), ensuring alignment with the sub-sectors considered in this study. The details of this validation process are presented in the technical validation section of this manuscript.<\/p>\n","protected":false},"excerpt":{"rendered":"The spatial disaggregation of FEC and emissions data is carried out in four steps, as illustrated in Fig.\u00a01.&hellip;\n","protected":false},"author":2,"featured_media":464558,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5310],"tags":[16476,27173,2000,299,1824,3965,3966,70],"class_list":{"0":"post-464557","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-germany","8":"tag-climate-sciences","9":"tag-environmental-sciences","10":"tag-eu","11":"tag-europe","12":"tag-germany","13":"tag-humanities-and-social-sciences","14":"tag-multidisciplinary","15":"tag-science"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/115296410247150499","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/464557","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=464557"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/464557\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/464558"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=464557"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=464557"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=464557"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}