{"id":56467,"date":"2025-04-28T03:55:30","date_gmt":"2025-04-28T03:55:30","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/56467\/"},"modified":"2025-04-28T03:55:30","modified_gmt":"2025-04-28T03:55:30","slug":"spatial-transcriptomics-identifies-molecular-niche-dysregulation-associated-with-distal-lung-remodeling-in-pulmonary-fibrosis","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/56467\/","title":{"rendered":"Spatial transcriptomics identifies molecular niche dysregulation associated with distal lung remodeling in pulmonary fibrosis"},"content":{"rendered":"<p>Participants and samples<\/p>\n<p>The studies described here comply with ethical regulations as approved under local institutional review boards, including providing written informed consent (Vanderbilt University Medical Center (VUMC) Institutional Review Board (060165 and 171657) and Western Institutional Review Board (20181836)). Peripheral PF lung samples were obtained from lungs removed at the time of lung transplant surgery at VUMC or Norton Thoracic Institute as previously described<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Natri, H. M. et al. Cell-type-specific and disease-associated expression quantitative trait loci in the human lung. Nat. Genet. 56, 595&#x2013;604 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR9\" id=\"ref-link-section-d410127756e2585\" target=\"_blank\" rel=\"noopener\">9<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Habermann, A. C. et al. Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Sci. Adv. 6, eaba1972 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR18\" id=\"ref-link-section-d410127756e2588\" target=\"_blank\" rel=\"noopener\">18<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\" title=\"Bui, L. T. et al. Chronic lung diseases are associated with gene expression programs favoring SARS-CoV-2 entry and severity. Nat. Commun. 12, 4314 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR42\" id=\"ref-link-section-d410127756e2591\" target=\"_blank\" rel=\"noopener\">42<\/a>. Control lung tissue samples were obtained from lungs declined for organ donation. Diagnoses were determined by the local treating clinicians and affiliated multidisciplinary committee and confirmed by review of explant pathology according to the current American Thoracic Society\/European Respiratory Society consensus guidelines<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Travis, W. D. et al. An official American Thoracic Society\/European Respiratory Society statement: update of the international multidisciplinary classification of the idiopathic interstitial pneumonias. Am. J. Respir. Crit. Care Med. 188, 733&#x2013;748 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR5\" id=\"ref-link-section-d410127756e2595\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Raghu, G. et al. Diagnosis of idiopathic pulmonary fibrosis. An official ATS\/ERS\/JRS\/ALAT clinical practice guideline. Am. J. Respir. Crit. Care Med. 198, e44&#x2013;e68 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR43\" id=\"ref-link-section-d410127756e2598\" target=\"_blank\" rel=\"noopener\">43<\/a>. Sample names were generated based on their collection site (Vanderbilt University\u2014\u2018VU\u2019; Translational Genomics Research Institute\u2014\u2018T\u2019), disease status (healthy donor\u2014\u2018HD\u2019; interstitial lung disease\u2014\u2018ILD\u2019), unique number assigned to the patient and where applicable, whether the sample was taken from a \u2018less affected\u2019 or \u2018more affected\u2019 tissue section as determined by percent pathology scores described later in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#Sec8\" target=\"_blank\" rel=\"noopener\">Methods<\/a>. Replicate healthy samples were labeled \u2018A\u2019 and \u2018B\u2019 to distinguish (for example, VUHD116A and VUHD116B), and replicate disease samples with the same \u2018less affected\u2019 or \u2018more affected\u2019 designation were labeled \u20181\u2019 and \u20182\u2019 (for example, VUILD105MA1 and VUILD105MA2).<\/p>\n<p>Tissue microarray (TMA) construction<\/p>\n<p>To maximize efficiency per run, multiple samples could be placed on a single Xenium slide (10.45\u2009mm\u2009\u00d7\u200922.45\u2009mm) using a TMA design. A 5-\u00b5m section of each lung formalin-fixed paraffin-embedded (FFPE) block was hematoxylin and eosin (H&amp;E) stained and presented to a physician who identified areas of interest and labeled them as less or more fibrotic (LF or MF) relative to the sample. For TMAs 1\u20134, blocks were designed in a 3\u2009\u00d7\u20093 or 2\u2009\u00d7\u20092 pattern for 3\u2009mm and 5\u2009mm cores, respectively (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#MOESM5\" target=\"_blank\" rel=\"noopener\">1<\/a>). Sample cores were punched and placed manually using a \u2018Quick-Ray Manual Tissue Microarrayer Full Set\u2019 and \u2018Quick-Ray Molds\u2019, according to the manufacturer\u2019s directions. Empty core spaces were filled with core punches taken from blank paraffin blocks (VWR, 76548-194). Once complete, blocks are placed face down on a clean glass slide and briefly heated in a warm drawer (~45\u2009\u00b0C) to slightly melt the paraffin together and even the block face. TMA blocks were then cooled to room temperature, removed from the slide, sealed and stored at 4\u2009\u00b0C.<\/p>\n<p>An additional TMA was constructed (TMA5) after improvement to the 10X Xenium processing and software. TMA5 was designed in a 3\u2009\u00d7\u20096 pattern of 3 mm cores with the top left core space remaining empty for orientation, with 17 samples total. A special 3\u2009mm square tip was created by the Arizona State University Instrument Design and Fabrication Core to fit the Quick-Ray Manual Tissue Microarrayer and allow us to produce square tissue cores. Due to the new core shape, this TMA was constructed by placing each core individually on double-sided adhesive tape (Gorilla Double-Sided Mounting Tape) in a paraffin block mold<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 44\" title=\"Chen, N. &amp; Zhou, Q. Constructing tissue microarrays without prefabricating recipient blocks: a novel approach. Am. J. Clin. Pathol. 124, 103&#x2013;107 (2005).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR44\" id=\"ref-link-section-d410127756e2619\" target=\"_blank\" rel=\"noopener\">44<\/a>. The entire mold is warmed to 37\u2009\u00b0C, and a P1000 is used to dispense 400\u2009\u03bcl of melted paraffin wax in the lanes between the cores. As more wax was poured into the back of the mold, the mold was gently tapped to dislodge air bubbles. Once solidified, the block was extracted from the mold, and the tape was removed. TMA5 then underwent the same warm drawer and storage conditions as all others.<\/p>\n<p>During TMA preparation, some samples were placed closely together such that a subset of transcripts and nuclei could not be distinguished as belonging specifically to either sample. These transcripts and nuclei were removed from downstream analyses.<\/p>\n<p>All samples were processed through the Xenium workflow, and four specific samples from TMA5 (one unaffected and three PF) were then additionally run through the Visium HD protocol after Xenium processing to generate orthogonal data for verifying phenomena of interest. Both workflows are described below.<\/p>\n<p>Xenium in situ workflowGene panel design<\/p>\n<p>Xenium in situ technology requires the use of a predefined gene panel. Each probe contains two paired sequences complementary to the targeted mRNA as well as a gene-specific barcode. Upon binding of the paired ends, ligation occurs. The now circular probe is amplified via rolling circle amplification, increasing the signal-to-noise ratio for target detection and decoding. A total of 343 unique genes were included in the analysis of this dataset. A total of 246 genes came from an early version of the Xenium human lung base panel (PD_277) and 97 genes from a custom-designed panel (CVEVZD). The custom panel was curated based on human lung single-cell analysis data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Habermann, A. C. et al. Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Sci. Adv. 6, eaba1972 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR18\" id=\"ref-link-section-d410127756e2641\" target=\"_blank\" rel=\"noopener\">18<\/a>, selecting genes useful for cell-type identification and\/or suspected to be involved in IPF.<\/p>\n<p>Xenium sample preparation<\/p>\n<p>As Xenium in situ technology examines RNA, all protocol workstations and equipment were cleaned using RNase AWAY (RPI, 147002) followed by 70% isopropanol. All reagents, including water, were molecular-grade nuclease-free. Sample preparation began with rehydrating and sectioning FFPE blocks on a microtome (Leica, RM2135). Sections measuring 5\u2009\u00b5m were placed onto Xenium slides (10X Genomics). Following overnight drying, slides with placed samples were stored in a sealed desiccator at room temperature for \u226410\u2009days. Slides were then placed in imaging cassettes for the remainder of the preparation. Tissue deparaffinization and decross-linking steps made subcellular RNA targets accessible. Gene panel probe hybridization occurred overnight for 18\u2009h at 50\u2009\u00b0C (Bio-Rad DNA Engine Tetrad 2). Subsequent washes the next day removed unbound probes. Ligase was added to circularize the paired ends of bound probes (2\u2009h at 37\u2009\u00b0C) and followed by enzymatic rolling circle amplification (2\u2009h at 30\u2009\u00b0C). TMA5 was prepped to include Xenium Multi-Tissue Stain (10X Genomics, 1000662) according to 10X Genomics\u2019 Demonstrated Protocol <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/nuccore\/CG000749\" target=\"_blank\" rel=\"noopener\">CG000749<\/a>. All slides were washed in Tris-EDTA (TE) buffer before background fluorescence was chemically quenched; autofluorescence is a known issue in lung tissue as well as a byproduct of formalin fixation<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Davis, A. S. et al. Characterizing and diminishing autofluorescence in formalin-fixed paraffin-embedded human respiratory tissue. J. Histochem. Cytochem. 62, 405&#x2013;423 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR45\" id=\"ref-link-section-d410127756e2660\" target=\"_blank\" rel=\"noopener\">45<\/a>. Following PBS and PBS-T washes, DAPI was used to stain sample nuclei. Finalized slides were stored in PBS-T in the dark at 4\u2009\u00b0C for \u22645\u2009days until being loaded onto the Xenium Analyzer instrument. Stepwise Xenium FFPE preparation guidelines and buffer recipes can be found in 10X Genomics\u2019 Demonstrated Protocols <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/nuccore\/CG000578\" target=\"_blank\" rel=\"noopener\">CG000578<\/a> and <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/nuccore\/CG000580\" target=\"_blank\" rel=\"noopener\">CG000580<\/a>.<\/p>\n<p>Xenium Analyzer instrument<\/p>\n<p>The Xenium Analyzer is a fully automated instrument for decoding subcellular localization of RNA targets. The user marks regions for analysis by manually selecting the sample location on an initial low-resolution, full-slide image. After loading consumable reagents and a maximum of two slides per run, internal sample and liquid handling mechanics control experiment progression. Data collection occurs in cycles of fluorescently labeled probe binding, image acquisition and probe stripping. Images of the fluorescent probes are taken in 4,240\u2009\u00d7\u20092,960-pixel fields of view (FOVs). Localized points of fluorescence intensity detected during the rounds of imaging are then defined as potential RNA puncta. Each gene on the panel has a unique fluorescence pattern across the image channels. Puncta that match a specific pattern is then decoded and labeled according to the gene ID. Finally, all image FOVs and associated detected transcripts are computationally stitched together via the DAPI-stained image. Onboard analysis pipelines present quality values for each detected transcript, based on variable confidence in the signal and decoding process. Data for TMA1\u2013TMA4 were acquired on instrument software version 1.1.2.4 and analysis version xenium-1.1.0.2. Data for TMA5 were acquired on instrument software version 2.0.1.0 and analysis version xenium-2.0.0.10. TMA5 had additional fluorescent images taken of each Xenium Multi-Tissue Stain channel to allow for cell segmentation. Detailed instructions on instrument operation and consumable preparation can be found in 10X Genomics\u2019 Demonstrated Protocol <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/nuccore\/CG000582\" target=\"_blank\" rel=\"noopener\">CG000582<\/a>.<\/p>\n<p>Postrun histology<\/p>\n<p>After the run, slides were removed from the Xenium Analyzer instrument and had quencher removed according to 10X Genomics\u2019 Demonstrated Protocol <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/nuccore\/CG000613\" target=\"_blank\" rel=\"noopener\">CG000613<\/a>. Immediately following, the slides were H&amp;E stained according to the following protocol: xylene (x3, 3\u2009min ea), 100% alcohol (x2, 2\u2009min ea), 95% alcohol (x2, 2\u2009min ea), 70% alcohol (2\u2009min), deionized (DI) water rinse (1\u2009min), hematoxylin (1\u2009min; Biocare Medical CATHE), DI water rinse (1\u2009min), bluing solution (1\u2009min; Biocare Medical HTBLU-M), DI water rinse (3\u2009min), 95% alcohol (30\u2009s), eosin (5\u2009s; Biocare Medical HTE-GL), 95% alcohol (10\u2009s), 100% alcohol (x2, 10\u2009s ea) and xylene (x2, 10\u2009s ea). Coverslipping was performed using Micromount (Leica, 3801731) and cured overnight at room temperature. Histology images were taken on a \u00d720 Leica Biosystems Aperio CS2.<\/p>\n<p>Data preprocessingCell segmentation<\/p>\n<p>Cell segmentation was performed with 10X Xenium onboard cell segmentation (10X Genomics). DAPI-stained nuclei from the DAPI morphology image were segmented, and boundaries were consolidated to form nonoverlapping objects. For samples on TMAs 1\u20134, nuclear boundaries were expanded by 15\u2009\u00b5m or until they reached another cell boundary to approximate cell segmentation. For samples on TMA5, cells were segmented using a cell boundary stain. For all TMAs, nuclear segmentation was used to define cells.<\/p>\n<p>Image registration<\/p>\n<p>Registration was performed between Xenium DAPI morphology images (that is, nuclei) and H&amp;E-stained images of the same slice with the BigWarp plugin implemented in ImageJ (2.14.0)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676&#x2013;682 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR46\" id=\"ref-link-section-d410127756e2729\" target=\"_blank\" rel=\"noopener\">46<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 47\" title=\"Bogovic, J. A., Hanslovsky, P., Wong, A. &amp; Saalfeld, S. Robust registration of calcium images by learned contrast synthesis. 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI) 1123&#x2013;1126 (IEEE, 2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR47\" id=\"ref-link-section-d410127756e2732\" target=\"_blank\" rel=\"noopener\">47<\/a>. To place both images in the coordinate space of the DAPI morphology image, the H&amp;E stained image was specified as the moving image, and the DAPI morphology was specified as the target. Anchors were placed on identifiable landmarks on both images (approximately 200 landmarks per image pair). A thin plate spline warp was then applied to align the corresponding anchor points between the two images. Registered images were manually reviewed for potential visual artifacts resulting from the registration process. These registered images were then visualized in Xenium Explorer 3.0.0 software, which was used to generate figures showing transcript expression overlain on H&amp;E stains.<\/p>\n<p>Quality filtering and data preprocessing<\/p>\n<p>For each sample, Xenium generated an output file of transcript information, including x and y coordinates, corresponding gene, assigned cell and\/or nucleus and quality score. Low-quality transcripts (quality value (QV)\u2009<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Hao, Y. et al. Dictionary learning for integrative, multimodal, and scalable single-cell analysis. Nat. Biotechnol. 42, 293&#x2013;304 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR32\" id=\"ref-link-section-d410127756e2750\" target=\"_blank\" rel=\"noopener\">32<\/a>. Nucleus-by-gene count matrices were created for each sample based on the expression of transcripts that fell within segmented nuclei. A single merged Seurat object was created for all samples based on these count matrices and metadata files with nuclei coordinates and area. Nuclei were retained according to the following criteria: \u226512 transcripts corresponding to \u226510 unique genes, percentage of high-quality transcripts corresponding to negative control probes, negative control codewords, unassigned codewords, or the cumulative percentage of these \u22645, and nucleus area \u22656 and \u226480\u2009\u00b5m. Because Xenium outputs coordinates based on each slide, which results in samples with shared coordinates across multiple slides, the nuclei coordinates were manually adjusted for visualization so that no samples overlapped during plotting. These adjusted nucleus coordinates were added to the Seurat object as a dimension-reduction object.<\/p>\n<p>Dimensionality reduction, clustering and cell-type annotation<\/p>\n<p>Scanpy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 48\" title=\"Wolf, F. A., Angerer, P. &amp; Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR48\" id=\"ref-link-section-d410127756e2762\" target=\"_blank\" rel=\"noopener\">48<\/a> in Python was used to perform dimensionality reduction and clustering, while Seurat v5 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Hao, Y. et al. Dictionary learning for integrative, multimodal, and scalable single-cell analysis. Nat. Biotechnol. 42, 293&#x2013;304 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR32\" id=\"ref-link-section-d410127756e2766\" target=\"_blank\" rel=\"noopener\">32<\/a>) in R was used for visualization of these results<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573&#x2013;3587 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR49\" id=\"ref-link-section-d410127756e2770\" target=\"_blank\" rel=\"noopener\">49<\/a>. Gene expression was normalized per cell using a log1p transformation. Dimensionality reduction was performed with principal component analysis. Nuclei were clustered using the Leiden algorithm<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Traag, V. A., Waltman, L. &amp; van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR50\" id=\"ref-link-section-d410127756e2774\" target=\"_blank\" rel=\"noopener\">50<\/a> based on this dimensionality reduction and computing a nearest neighbors distance matrix. Uniform Manifold Approximation and Projection (UMAP) plots of the data were then generated for visualization. For speed, these calculations were performed using a container<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"dockerhub. claraparabricks\/single-cell-examples_rapids_cuda11.0. &#010;                  hub.docker.com\/r\/claraparabricks\/single-cell-examples_rapids_cuda11.0&#010;                  &#010;                 (accessed 6 September 2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR51\" id=\"ref-link-section-d410127756e2778\" target=\"_blank\" rel=\"noopener\">51<\/a> with the RAPIDS (v21.8.1) implementation<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Dicks, S. et al. scverse\/rapids_singlecell: v0.10.5 (v0.10.5). Zenodo &#010;                  https:\/\/doi.org\/10.5281\/zenodo.11562577&#010;                  &#010;                 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR52\" id=\"ref-link-section-d410127756e2783\" target=\"_blank\" rel=\"noopener\">52<\/a> of Scanpy (v1.8.1).<\/p>\n<p>For cell-type annotation, we used both marker genes and spatial information. We did not rely solely on gene expression because some level of gene \u2018contamination\u2019 is expected with spatial data, as transcripts located in one cell may be assigned to an adjacent cell in sufficiently close proximity. Therefore, we incorporated spatial data including cell morphology and histological features to label clusters. Initial Leiden clusters were primarily segregated by cell lineage based on marker genes assessed using the Seurat v5\u2019s FindMarkers function, including PECAM1 (endothelial), EPCAM (epithelial), PTPRC (immune), DCN, LUM and COL1A1 (all mesenchymal)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Hao, Y. et al. Dictionary learning for integrative, multimodal, and scalable single-cell analysis. Nat. Biotechnol. 42, 293&#x2013;304 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR32\" id=\"ref-link-section-d410127756e2809\" target=\"_blank\" rel=\"noopener\">32<\/a>. New Seurat\/AnnData objects were created for each of the four lineages, and dimensionality reduction and clustering were performed again as described above. Clusters were then given first-pass cell-type labels based on marker genes. The epithelial and immune lineages were further split into sublineages for alveolar and airway cells (epithelial) as well as myeloid and lymphoid cells (immune), and new Seurat\/AnnData objects were generated and reprocessed for these sublineages. As lineage and subgroup splitting became more accurate, clusters were relabeled with revised annotations. \u2018Stray\u2019 clusters that did not belong to the lineage or sublineage they were originally assigned to, as well as clusters that were marked by genes from more than one lineage, received final annotations based on shared marker genes and spatial patterns with confidently labeled clusters. Low-quality clusters with extremely low transcript counts and\/or with conflicting marker genes that could not be resolved (similar to \u2018doublets\u2019 in scRNA-seq data) were removed. This resulted in 47 final cell types, including 4 endothelial, 12 epithelial, 22 immune and 9 mesenchymal (Extended Data Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#Fig8\" target=\"_blank\" rel=\"noopener\">2<\/a>\u2013<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#Fig11\" target=\"_blank\" rel=\"noopener\">5<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#MOESM5\" target=\"_blank\" rel=\"noopener\">2<\/a>).<\/p>\n<p>H&amp;E image annotationPercent pathology assignment<\/p>\n<p>Percent pathology was assessed by visual estimation of the fraction of the total imaged area of the sample with architectural remodeling. Disease samples were labeled \u2018more affected\u2019 if their percent pathology was greater than or equal to 75%. All other disease samples were labeled \u2018less affected\u2019, and control samples were labeled \u2018unaffected\u2019 regardless of percent pathology score.<\/p>\n<p>Annotation of histological features<\/p>\n<p>We annotated representative examples of 27 histological features across all 45 samples, including 12 epithelial or of likely epithelial origin (normal alveoli, minimally remodeled alveoli, hyperplastic AECs, emphysema, remodeled epithelium, advanced remodeling, epithelial detachment, remnant alveoli, small airway, large airway, microscopic honeycombing and goblet cell metaplasia), 3 vascular (artery, muscularized artery and venule), 3 immune (granuloma, mixed inflammation and TLS), 5 mesenchymal\/interstitial (interlobular septum, airway smooth muscle, fibroblastic focus, fibrosis and severe fibrosis) and 2 general (multinucleated cell and giant cell) feature types. Annotations were marked on the registered H&amp;E images using QuPath (v.0.4.3)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Bankhead, P. et al. QuPath: open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR53\" id=\"ref-link-section-d410127756e2844\" target=\"_blank\" rel=\"noopener\">53<\/a>. Cells were then assigned to annotated regions using a custom Python (v.3.12.3) script. Briefly, annotated regions were scaled by a factor of 0.2125\u2009\u00b5m per pixel to harmonize units with the cell centroid coordinates. Cells were assigned a boolean value for each annotated region corresponding to whether the cell centroid fell within the annotated region. After filtering to annotations containing at least one cell, there were 712 annotations across the 27 histological features.<\/p>\n<p>Comparison to scRNA-seq datasets<\/p>\n<p>We compared cell lineage recovery from the present spatial dataset to two scRNA-seq sources. The first was ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Natri, H. M. et al. Cell-type-specific and disease-associated expression quantitative trait loci in the human lung. Nat. Genet. 56, 595&#x2013;604 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR9\" id=\"ref-link-section-d410127756e2857\" target=\"_blank\" rel=\"noopener\">9<\/a>, a recent sc-eQTL study from our lab<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Natri, H. M. et al. Cell-type-specific and disease-associated expression quantitative trait loci in the human lung. Nat. Genet. 56, 595&#x2013;604 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR9\" id=\"ref-link-section-d410127756e2861\" target=\"_blank\" rel=\"noopener\">9<\/a>, and the other was the Human Lung Cell Atlas (HLCA), which included aggregated data from multiple lung scRNA-seq studies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 12\" title=\"Sikkema, L. et al. An integrated cell atlas of the lung in health and disease. Nat. Med. 29, 1563&#x2013;1577 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR12\" id=\"ref-link-section-d410127756e2865\" target=\"_blank\" rel=\"noopener\">12<\/a>. Studies from the HLCA varied in the disease studied and whether the dataset contained control and\/or lung disease samples. We narrowed the scope of our comparison to studies (lung atlases) or samples (sc-eQTL study) that did not specifically enrich or deplete cells from a specific lineage in their preprocessing and did not exclusively contain data from nasal samples. We included 47 samples from the sc-eQTL study (19 control and 28 ILD) and 14 datasets from the HLCA. For each data source, the proportion of cells from each lineage (endothelial, epithelial, immune and mesenchymal) was calculated first for all samples and then for control and disease samples separately. If an annotated cell type did not have a clear lineage association, it was removed from the analysis. Sample-level data from the present study was compared to sample-level data from the sc-eQTL study and dataset-level information from HLCA.<\/p>\n<p>Post-Xenium Visium HD workflow for select samplesPost-Xenium Visium HD sample preparation<\/p>\n<p>All protocol workstations and equipment were cleaned using RNase Away (RPI 147002) followed by 70% isopropanol. All reagents, including water, were molecular-grade nuclease-free. The Xenium slide containing IPFTMA5 was H&amp;E stained postrun as described in \u2018Xenium in situ workflow\u2014Postrun histology\u2019. The slide was stored for 9\u2009days postcoverslipping in a sealed desiccator at 4\u2009\u00b0C. The slide was then decoverslipped in xylene, loaded into a Visium tissue slide cassette (10X Genomics), and destained using 0.1\u2009N HCl. The decross-linking step of the archived slide protocol was skipped due to this already being performed as part of the Xenium workflow. Stepwise archived slide preparation guidelines and buffer recipes can be found in 10X Genomics\u2019 Demonstrated Protocol <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/nuccore\/CG000684\" target=\"_blank\" rel=\"noopener\">CG000684<\/a>.<\/p>\n<p>Visium HD CytAssist preparation<\/p>\n<p>The tissue slide within the Visium HD tissue cassette was then prepared for loading onto the CytAssist instrument. It underwent human probe hybridization (10X Genomics, 1000466) and a posthybridization wash to remove unbound probes, followed by probe ligation and subsequent wash. The Visium HD slide containing the probe capture areas was prepared. Both the Visium HD slide and the tissue slide were loaded onto the CytAssist instrument to enable the probe release and capture. Four samples were targeted as a representative subset (one unaffected and three fibrotic) for the Visium HD analysis (VUHD049, VUILD49LA, TILD111LA and TILD113LA) as it has a smaller capture area than the Xenium slide. Once captured on the Visium HD slide, probes were extended, eluted into 0.08\u2009M KOH and then transferred into 1\u2009M Tris\u2013HCl (pH 8). Pre-amplification cleanup was performed using SPRIselect reagent (Beckman Coulter, <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/nuccore\/B23318\" target=\"_blank\" rel=\"noopener\">B23318<\/a>). The sample was stored at 4\u2009\u00b0C overnight. Detailed instructions regarding Visium HD CytAssist operation, tissue cassette and Visium HD slide preparation can be found in 10X Genomics\u2019 Demonstrated Protocol <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/nuccore\/CG000685\" target=\"_blank\" rel=\"noopener\">CG000685<\/a>.<\/p>\n<p>Visium HD library construction and sequencing<\/p>\n<p>The probe-based library was constructed using Dual Index Plate TS Set A (10X Genomics, 1000251) with a sample index PCR cycle number of 16, as determined by the Cq value from qPCR (Applied Biosystems QuantStudio 5). A final cleanup step with SPRIselect reagent was followed by storage at \u221220\u2009\u00b0C until sequencing. Library quality control was performed on an Agilent TapeStation 4200. Next-generation sequencing was carried out on Illumina\u2019s NovaSeq X platform using a 10 billion read, 300 cycle flow cell. Minimum sequencing recommendations were met using the product of tissue coverage percentage estimations and a constant maximum read figure of 275\u2009million read pairs. Optimal cluster density was achieved with a lane loading concentration of 235\u2009pM. All steps were performed according to the instructions found in 10X Genomics\u2019 Demonstrated Protocol <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/nuccore\/CG000685\" target=\"_blank\" rel=\"noopener\">CG000685<\/a>.<\/p>\n<p>Visium HD data processing<\/p>\n<p>A high-resolution image was aligned using the Visium HD Manual Alignment tool on Loupe Browser (v8.0.0; 10X Genomics)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Loupe Browser&#x2014;official 10X Genomics support. 10X Genomics &#010;                  www.10xgenomics.com\/support\/software\/loupe-browser\/latest&#010;                  &#010;                 (accessed 6 September 2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR54\" id=\"ref-link-section-d410127756e2933\" target=\"_blank\" rel=\"noopener\">54<\/a>. Five matched landmarks per sample within the capture area were selected for the alignment and then algorithmically refined by the software. Visium HD sequence data, high-resolution image and alignment file were processed using the count function of Spaceranger (3.0.0; 10X Genomics)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\" title=\"Space Ranger&#x2014;official 10X Genomics support. 10X Genomics &#010;                  www.10xgenomics.com\/support\/software\/space-ranger\/latest&#010;                  &#010;                 (accessed 6 September 2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR55\" id=\"ref-link-section-d410127756e2937\" target=\"_blank\" rel=\"noopener\">55<\/a> with default parameters. Reads were mapped against the GRCh38-2020-A reference.<\/p>\n<p>Visium HD data analysis<\/p>\n<p>To compare the Xenium and Visium HD outputs, the Loupe Browser (v8.0.0) software was used to visualize gene expression over H&amp;E images for the Visium HD data. Composite gene expression scores were used to visualize marker gene expression of several cell types of interest. These composite scores were created by summing the log2-transformed unique molecular identifier (UMI) counts of 7\u201326 genes per cell type that were selected based on the dotplot heatmap in Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#Fig7\" target=\"_blank\" rel=\"noopener\">1<\/a> and prior literature (KRT5\u2212\/KRT17+ \u2018aberrant basaloid\u2019 cells<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Habermann, A. C. et al. Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Sci. Adv. 6, eaba1972 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR18\" id=\"ref-link-section-d410127756e2964\" target=\"_blank\" rel=\"noopener\">18<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Adams, T. S. et al. Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis. Sci. Adv. 6, eaba1983 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR19\" id=\"ref-link-section-d410127756e2967\" target=\"_blank\" rel=\"noopener\">19<\/a>; activated fibrotic fibroblasts<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\" title=\"Peyser, R. et al. Defining the activated fibroblast population in lung fibrosis using single-cell sequencing. Am. J. Respir. Cell Mol. Biol. 61, 74&#x2013;85 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR56\" id=\"ref-link-section-d410127756e2971\" target=\"_blank\" rel=\"noopener\">56<\/a>; alveolar FABP4+ macrophages<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\" title=\"Ayaub, E. A. et al. Single cell RNA-seq and mass cytometry reveals a novel and a targetable population of macrophages in idiopathic pulmonary fibrosis. Preprint at bioRxiv &#010;                  https:\/\/doi.org\/10.1101\/2021.01.04.425268&#010;                  &#010;                 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR24\" id=\"ref-link-section-d410127756e2979\" target=\"_blank\" rel=\"noopener\">24<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Bain, C. C. &amp; MacDonald, A. S. The impact of the lung environment on macrophage development, activation and function: diversity in the face of adversity. Mucosal Immunol. 15, 223&#x2013;234 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR37\" id=\"ref-link-section-d410127756e2982\" target=\"_blank\" rel=\"noopener\">37<\/a>; SPP1+ macrophages<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\" title=\"Ayaub, E. A. et al. Single cell RNA-seq and mass cytometry reveals a novel and a targetable population of macrophages in idiopathic pulmonary fibrosis. Preprint at bioRxiv &#010;                  https:\/\/doi.org\/10.1101\/2021.01.04.425268&#010;                  &#010;                 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR24\" id=\"ref-link-section-d410127756e2991\" target=\"_blank\" rel=\"noopener\">24<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Bain, C. C. &amp; MacDonald, A. S. The impact of the lung environment on macrophage development, activation and function: diversity in the face of adversity. Mucosal Immunol. 15, 223&#x2013;234 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR37\" id=\"ref-link-section-d410127756e2994\" target=\"_blank\" rel=\"noopener\">37<\/a>; Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#MOESM5\" target=\"_blank\" rel=\"noopener\">10<\/a>). Three strong canonical marker genes for KRT5\u2212\/KRT17+ cells or activated fibroblasts (COL1A1, FN1 and ACTA2) were not included in the composite scores for these cell types because they mark both cell types (COL1A1 and FN1) or because they strongly mark another cell type (ACTA2 marks smooth muscle). For the comparison between alveolar macrophages and SPP1+ macrophages (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#Fig6\" target=\"_blank\" rel=\"noopener\">6h<\/a>), genes that were strong markers for both macrophage populations in the present Xenium dataset according to Supplementary Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#MOESM1\" target=\"_blank\" rel=\"noopener\">3<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#MOESM1\" target=\"_blank\" rel=\"noopener\">6<\/a> were excluded from the alveolar macrophage marker list, except for PPARG, which is known to distinguish these populations in scRNA-seq and had low expression in SPP1+ regions in the Visium HD data.<\/p>\n<p>Computational niche identificationTranscript-based niches<\/p>\n<p>Transcript-based niches were characterized using a graph neural network model GraphSAGE (implementation in StellarGraph v1.2.1\u2009+\u2009Python 3.8.0)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Hamilton, W. L., Ying, R. &amp; Leskovec, J. Inductive representation learning on large graphs. Preprint at &#010;                  https:\/\/doi.org\/10.48550\/arXiv.1706.02216&#010;                  &#010;                 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR31\" id=\"ref-link-section-d410127756e3063\" target=\"_blank\" rel=\"noopener\">31<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"Partel, G. &amp; W&#xE4;hlby, C. Spage2vec: unsupervised representation of localized spatial gene expression signatures. FEBS J. 288, 1859&#x2013;1870 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR57\" id=\"ref-link-section-d410127756e3066\" target=\"_blank\" rel=\"noopener\">57<\/a> that directly modeled detected transcripts without cell segmentation. GraphSAGE learns the structures in the graph data via sampling and aggregating neighbors, scaling well to large graphs. Detected transcripts from each sample were used to build a sample graph after removing low-quality transcripts (QV)\u2009d\u2009=\u20093.0) similar to a previous study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"Partel, G. &amp; W&#xE4;hlby, C. Spage2vec: unsupervised representation of localized spatial gene expression signatures. FEBS J. 288, 1859&#x2013;1870 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR57\" id=\"ref-link-section-d410127756e3073\" target=\"_blank\" rel=\"noopener\">57<\/a>. Each node was associated with an input feature vector that was the one-hot encoding of the node\u2019s gene label, and small components with nodes with fewer than ten nodes were removed (n\u2009=\u2009299,018,086 after filtering).<\/p>\n<p>A two-hop GraphSAGE model was applied to learn node embeddings. It sampled 20 and 10 neighboring nodes at the first-hop and second-hop neighbors, respectively, to learn 50-dimensional embeddings at each hop for each node in the graph. To embed all nodes across samples into the same embedding space, we trained the model on a joined graph consisting of subgraphs from 28 samples (TMAs 1\u20134) with each subgraph containing 5,000 randomly sampled root nodes along with their three-hop neighbors as well as the existing edges among them. The joined graph contained 18,594,542 nodes and 421,316,086 edges for model training. The model was trained in an unsupervised manner by solving a binary classification task for \u2018positive\u2019 and \u2018negative\u2019 node pairs, in which the model predicts whether or not two nodes should have an edge connection based on their local neighborhood structures. We trained the model with ten epochs, achieving a training accuracy of 0.82 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Hamilton, W. L., Ying, R. &amp; Leskovec, J. Inductive representation learning on large graphs. Preprint at &#010;                  https:\/\/doi.org\/10.48550\/arXiv.1706.02216&#010;                  &#010;                 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR31\" id=\"ref-link-section-d410127756e3083\" target=\"_blank\" rel=\"noopener\">31<\/a>).<\/p>\n<p>Embedding representations of all nodes (transcripts) across the 28 samples from TMAs 1\u20134 were obtained using the trained model, and we performed unsupervised clustering on all nodes using the Gaussian mixture model (k\u2009=\u200912) as implemented in the PyCave library (<a href=\"https:\/\/github.com\/borchero\/pycave\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/borchero\/pycave<\/a>). We subsequently obtained the embeddings of the 17 samples from TMA5 using the same trained model and projected their transcripts into the existing 12 transcript clusters.<\/p>\n<p>                    Transcript-based niche plots and assignment of nuclei to niches<\/p>\n<p>We created a hexbin summarization plot for each sample using all transcripts with a bin width of 5 for generating the transcript-based niche plots to overcome overplotting issues. Each bin was labeled and colored with the major cluster label among the transcripts falling in the bin area. Bins with fewer than ten transcripts were not included in the plots. To assign cells to transcript-based niches, we assigned each cell to its closest hexbin by calculating the Euclidean distance of each cell centroid to the hexbin centroids.<\/p>\n<p>                  Cell-based niches<\/p>\n<p>Seurat v5\u2019s<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Hao, Y. et al. Dictionary learning for integrative, multimodal, and scalable single-cell analysis. Nat. Biotechnol. 42, 293&#x2013;304 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR32\" id=\"ref-link-section-d410127756e3117\" target=\"_blank\" rel=\"noopener\">32<\/a> BuildNicheAssay function was used to partition cells into 12 spatial niches using k-means clustering based on the cell-type composition of their 25 closest neighboring cells.<\/p>\n<p>Cell-type proximity analysis<\/p>\n<p>To assess proximity between cell types, each cell was anchored, and its distance and angular direction were calculated between the anchored cells and their neighboring cells within a fixed 60-\u00b5m radius circle. Only first-degree neighbors were considered proximal to the anchored cell. The probability of cell types being proximal to one another was assessed using logistic regression by assigning cells to binarized proximal and nonproximal categories and using cell type as a covariate. This analysis was performed both for all samples and cell types and within each cell and transcript niche individually.<\/p>\n<p>Differential expression and composition analysesDifferential cell-type composition by disease severity<\/p>\n<p>We transformed the cell-type proportion per sample using logit transformation and tested whether the cell-type proportion was different among three disease groups (unaffected, less affected and more affected samples) using the propeller.anova function implemented in propeller<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Phipson, B. et al. propeller: testing for differences in cell type proportions in single cell data. Bioinformatics 38, 4720&#x2013;4726 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR58\" id=\"ref-link-section-d410127756e3145\" target=\"_blank\" rel=\"noopener\">58<\/a>.<\/p>\n<p>Representative gene detection in transcript-based niches<\/p>\n<p>We used the linear model framework implemented in propeller<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Phipson, B. et al. propeller: testing for differences in cell type proportions in single cell data. Bioinformatics 38, 4720&#x2013;4726 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR58\" id=\"ref-link-section-d410127756e3157\" target=\"_blank\" rel=\"noopener\">58<\/a> and limma<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR59\" id=\"ref-link-section-d410127756e3161\" target=\"_blank\" rel=\"noopener\">59<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 60\" title=\"Phipson, B., Lee, S., Majewski, I. J., Alexander, W. S. &amp; Smyth, G. K. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Ann. Appl. Stat. 10, 946&#x2013;963 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR60\" id=\"ref-link-section-d410127756e3164\" target=\"_blank\" rel=\"noopener\">60<\/a> to test differences in gene proportions across niches and thereby identify representative genes of each niche. We first investigated the proportion of transcripts assigned to each niche within each sample. Samples were excluded from testing for a niche when a few transcripts from that sample (proportion\u2009\u22124) were assigned to the niche. Gene proportions were logit-transformed, and a linear model was fitted to each gene that modeled the mean transformed gene proportion in each niche while accounting for sample differences. Differentially abundant genes were then derived by contrasting the mean gene proportion from one niche with the average mean proportion from the other niches.<\/p>\n<p>Differential expression, cell-type composition and niche proportions by percent pathology<\/p>\n<p>We used the same propeller<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Phipson, B. et al. propeller: testing for differences in cell type proportions in single cell data. Bioinformatics 38, 4720&#x2013;4726 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR58\" id=\"ref-link-section-d410127756e3178\" target=\"_blank\" rel=\"noopener\">58<\/a> and limma<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR59\" id=\"ref-link-section-d410127756e3182\" target=\"_blank\" rel=\"noopener\">59<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 60\" title=\"Phipson, B., Lee, S., Majewski, I. J., Alexander, W. S. &amp; Smyth, G. K. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Ann. Appl. Stat. 10, 946&#x2013;963 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR60\" id=\"ref-link-section-d410127756e3185\" target=\"_blank\" rel=\"noopener\">60<\/a> frameworks for finding gene, cell type and niche proportions that correlated with changes in percent pathology across samples. Proportions were logit-transformed, and linear models were fit to model the relationship between the transformed proportions and percent pathology across samples to find features (that is, genes, cell types and niches) that varied significantly with percent pathology (FDR\u2009<\/p>\n<p>We also performed a cell-type-aware differential gene expression along percent pathology across samples by pseudo-bulking gene expression per identified cell type. For each cell type, we filtered the list of testing genes to remove contamination signals. To identify contamination genes per cell type, we first scaled the gene counts using a count-per-1K normalization across cells and kept genes with at least five counts in 30% of cells within a cell type for testing. The remaining genes per cell type were then tested with the propeller<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Phipson, B. et al. propeller: testing for differences in cell type proportions in single cell data. Bioinformatics 38, 4720&#x2013;4726 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR58\" id=\"ref-link-section-d410127756e3192\" target=\"_blank\" rel=\"noopener\">58<\/a> and limma<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR59\" id=\"ref-link-section-d410127756e3196\" target=\"_blank\" rel=\"noopener\">59<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 60\" title=\"Phipson, B., Lee, S., Majewski, I. J., Alexander, W. S. &amp; Smyth, G. K. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Ann. Appl. Stat. 10, 946&#x2013;963 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR60\" id=\"ref-link-section-d410127756e3199\" target=\"_blank\" rel=\"noopener\">60<\/a> frameworks.<\/p>\n<p>Differential expression analysis across annotated pathology features<\/p>\n<p>We aggregated gene counts from cells included in each pathology annotation instance and detected differentially expressed genes among pathology annotations by fitting linear models implemented in limma<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR59\" id=\"ref-link-section-d410127756e3211\" target=\"_blank\" rel=\"noopener\">59<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 61\" title=\"Law, C. W., Chen, Y., Shi, W. &amp; Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR61\" id=\"ref-link-section-d410127756e3214\" target=\"_blank\" rel=\"noopener\">61<\/a>. Gene expression was converted to counts per million (CPM) and log2-transformed after adding a 0.5 pseudocount. Lowly expressed genes (log2(CPM)\u20092CPM observation value, which was subsequently used in the linear regression model to account for heteroskedasticity in count data. We controlled for the TMA effects by adding TMA origins as a covariate in the linear model. Differentially expressed genes for each annotation type were detected by comparing the gene expression per annotation type with the rest. Separately, we also compared epithelial annotations to each other to identify dysregulated genes in pathologic epithelium.<\/p>\n<p>Lumen segmentation and airspace identificationInitial lumen segmentation<\/p>\n<p>To segment individual lumens from the spatial sequencing data, a custom graphics processing unit (GPU)-accelerated image processing algorithm was developed using scikit-image (v.0.19.3), RAPIDS cuCIM (v.22.12.00) and CuPy (v.11.2.0) in Python (v.3.9.7) to assign unique identifiers to lumens in the spatial transcriptome data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Van der Walt, S. et al. scikit-image: image processing in Python. PeerJ 2, e453 (2014).\" href=\"#ref-CR62\" id=\"ref-link-section-d410127756e3237\">62<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"RAPIDS. RAPIDS: libraries for end to end GPU data science. rapids.ai\/ (accessed 6 September 2024).\" href=\"#ref-CR63\" id=\"ref-link-section-d410127756e3237_1\">63<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 64\" title=\"Okuta, R., Unno, Y., Nishino, D., Hido, S. &amp; Loomis, C. CuPy: a NumPy-compatible library for NVIDIA GPU calculations. Proceedings of Workshop on Machine Learning Systems (LearningSys) &#010;                  http:\/\/learningsys.org\/nips17\/assets\/papers\/paper_16.pdf&#010;                  &#010;                 (NIPS, 2017).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR64\" id=\"ref-link-section-d410127756e3240\" target=\"_blank\" rel=\"noopener\">64<\/a>. Briefly, the location of each detected transcript, excluding transcripts associated with immune cells, was binarized to a 2D image followed by a series of dilation, erosion and closing operations to define the location of the tissue and segment the lumens. The outer boundaries of the tissue were defined using Alpha Shapes (alphashape package v.1.3.1) on a denoised, closed shape representing the entire tissue. Unique identifiers were then assigned to the negative space (lumens) using scikit-image, and metrics were calculated for each individual lumen. Cell centers (nuclei) within a cutoff distance, defined as the thickness of a normal alveolar wall, were then assigned to the unique identifier of the closest lumen by searching for the nearest label in a restricted zone using the K-D Tree nearest-neighbor lookup algorithm implemented in scikit-image. All cells that were not close to a lumen were given an identifier of zero.<\/p>\n<p>Quality filtering to isolate alveolar airspaces<\/p>\n<p>Lumens were first filtered by size to remove false positive segmentations. We retained lumens that contained between 25 and 500 cells, including at least 5% epithelial cells, for which the maximum distance between any two nuclei in the lumen was at least 110\u2009\u00b5m. The high cell number and low epithelial cell thresholds were selected to retain lumens with large macrophage accumulations. For each lumen, the proportion of cells belonging to each cell niche was calculated, and the cell niche(s) with the highest proportion of cells was recorded as the \u2018maximum cell niche\u2019. To discard lumens corresponding to endothelial and airway structures, lumens were required to have an endothelial niche proportion of C10\u2009KRT5\u2212\/KRT17+ fibrotic niche) and\/or C11 (airspace macrophage accumulation niche).<\/p>\n<p>Assessing changes in cell types, niches and gene expression along pseudotime<\/p>\n<p>Pseudotime analysis was performed on the remaining 1,747 alveolar airspaces based on the proportion of transcripts overlapping nuclei that were assigned to the T4 healthy alveolar niche. This allowed us to rank airspaces along a continuum of disease severity and tissue remodeling. To find cell types, niches and genes with varied abundances or expression along the pseudotime, we applied GAMs using the fitGAM function from tradeSeq<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 65\" title=\"Van den Berge, K. et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 11, 1201 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR65\" id=\"ref-link-section-d410127756e3268\" target=\"_blank\" rel=\"noopener\">65<\/a> on the aggregated feature counts across airspaces under a negative binomial distribution. We subsequently applied associationTest<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 65\" title=\"Van den Berge, K. et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 11, 1201 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR65\" id=\"ref-link-section-d410127756e3272\" target=\"_blank\" rel=\"noopener\">65<\/a> to test the association of features with pseudotime.<\/p>\n<p>Cell-type changes along pseudotime-ordered airspaces<\/p>\n<p>We counted the number of each cell type across airspaces and fitted a GAM model per cell type against pseudotime with an offset of log-transformed total cell counts. We excluded cell types if they were not present with at least three counts in at least ten airspaces (36 cell types tested). We obtained the list of significant cell types (28 in total) using the associationTest<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 65\" title=\"Van den Berge, K. et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 11, 1201 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR65\" id=\"ref-link-section-d410127756e3284\" target=\"_blank\" rel=\"noopener\">65<\/a>.<\/p>\n<p>Cell-type-based and transcript-based niches<\/p>\n<p>We applied similar analysis for cell-type-based and transcript-based niches as for cell-type analysis given above. Niches were excluded if they were not present with at least three counts in at least 10% of segmented airspaces. We obtained the smoothed niche abundance changes along 2,000 time points after model fitting and visualized their patterns in heatmaps after ordering them by their peak time point. The peak time point for each niche was determined using a rolling mean approach (window size, n\u2009=\u2009100). Features with earlier peak times were plotted in the top rows.<\/p>\n<p>Classifying gene expression patterns along pseudotime<\/p>\n<p>We characterized gene expression patterns along pseudotime-ordered airspaces using GAM models by modeling the gene count along pseudotime under negative binomial distribution. We added the total number of detected transcripts per lumen as an offset term as well as a TMA covariate for controlling TMA effects. With the fitted model, we predicted the gene expression along pseudotime and clustered the gene expression patterns first using hierarchical clustering (k\u2009=\u200920) after taking z scores per gene. Genes were separated into bimodal and unimodal patterns by visual inspection of the 20 hierarchical clusters. For the unimodal genes, we further clustered them by spectral clustering (k\u2009=\u20094) using the R package kernlab<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 66\" title=\"Karatzoglou, A., Smola, A., Hornik, K. &amp; Zeileis, A. kernlab&#x2014;an S4 package for Kernel methods in R. J. Stat. Softw. 11, 1&#x2013;20 (2004).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR66\" id=\"ref-link-section-d410127756e3317\" target=\"_blank\" rel=\"noopener\">66<\/a>. We ordered the four gene clusters by their gene expression peak times and labeled them into four categories, including homeostasis, early remodeling, intermediate remodeling and late remodeling. The gene expression dynamics of the four gene clusters were then visualized in heatmaps.<\/p>\n<p>Altered gene expression along pseudotime within each cell type<\/p>\n<p>We tested which genes had altered expression along pseudotime within a specific cell type using GAMs as well. We tested 25 cell types that were observed with more than three cells in at least 80 airspaces. For each cell type, we aggregated their gene expression across segmented airspaces. Within a specific cell type, for the expression of each gene \\(\\mu\\), we fit a negative binomial GAM model to model the gene expression across lumens using the gam function with cubic regression splines \u2018bs=\u201ccr\u201d\u2019 implemented in mgcv<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 67\" title=\"Wood, S. N. Generalized Additive Models: An Introduction With R, Second Edition (CRC Press, 2017).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR67\" id=\"ref-link-section-d410127756e3346\" target=\"_blank\" rel=\"noopener\">67<\/a> as<\/p>\n<p>$$\\log (\\mu )=s(t)+\\mathop{\\sum }\\limits_{i=1}^{15}s({p}_{i})+\\,\\log (n)+U,$$<\/p>\n<p>where t represents pseudotime, pi represents the proportion of cell type i, n represents the number of total transcripts per lumen and U represents the TMAs. We applied the same strategies in knot selection as in tradeSeq<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 65\" title=\"Van den Berge, K. et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 11, 1201 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#ref-CR65\" id=\"ref-link-section-d410127756e3492\" target=\"_blank\" rel=\"noopener\">65<\/a> and selected five knots per smooth term. We included proportions of 15 major cell types that had more than three cells in more than 300 airspaces in the model to control for the contamination in detected gene expression signals. When testing genes for significant association with pseudotime per cell type, we restricted genes to the set of significant genes in the overall gene association test across all cell types. For each cell type, we then tested genes that were detected with more than three copies in 50% of the testing airspaces. Genes that had significant pseudotime terms after multiple testing corrections (FDR\u2009<\/p>\n<p>Statistics and reproducibility<\/p>\n<p>Sample sizes were chosen based on sample availability, although data generation was randomized with respect to disease and control samples. The study was unblinded. Any data exclusions are noted above, including filtering cells of a large size or unexpectedly low or high gene counts based on current best practices. Key findings from Xenium data were replicated with orthogonal technology (Visium HD) as described above. For figures depicting H&amp;E images of described phenomena (for example, cell types in annotations in Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#Fig3\" target=\"_blank\" rel=\"noopener\">3d<\/a> and macrophage accumulations in Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#Fig6\" target=\"_blank\" rel=\"noopener\">6c\u2013e,g\u2013h<\/a>), at least three similar examples were observed across samples, and the images shown are intended to be representative but not exhaustive. For representative H&amp;E images overlain with transcripts or cell types, see the accompanying dotplot heatmaps for additional context. Full data are available at the Gene Expression Omnibus (GEO), including histopathology, gene expression data and cell-type annotations.<\/p>\n<p>Reporting summary<\/p>\n<p>Further information on research design is available in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02080-x#MOESM2\" target=\"_blank\" rel=\"noopener\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n","protected":false},"excerpt":{"rendered":"Participants and samples The studies described here comply with ethical regulations as approved under local institutional review boards,&hellip;\n","protected":false},"author":2,"featured_media":56468,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3846],"tags":[3971,3973,3967,3970,17907,3972,3968,267,3969,29802,70,16,15],"class_list":{"0":"post-56467","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-agriculture","9":"tag-animal-genetics-and-genomics","10":"tag-biomedicine","11":"tag-cancer-research","12":"tag-gene-expression","13":"tag-gene-function","14":"tag-general","15":"tag-genetics","16":"tag-human-genetics","17":"tag-respiratory-tract-diseases","18":"tag-science","19":"tag-uk","20":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114413576285289076","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/56467","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=56467"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/56467\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/56468"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=56467"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=56467"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=56467"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}