Note 1: This post is part 3 of a three-part series on healthcare, knowledge graphs, and lessons for other industries. Part 1, “What Is a Knowledge Graph — and Why It Matters” is available here. Part 2, “Why Healthcare Leads in Knowledge Graphs” is available here.
Note 2: All images by author
Note 3: While doing research for this article, I found that there are lots of lists of existing resources (ontologies, controlled vocabularies, software) as well as lists of lists of lists of resources 🤯. So, I built an app that runs queries against Wikidata to get these resources directly. The code is available here. Let’s use the Semantic Web to power the Semantic Web :).
Healthcare didn’t become a leader in knowledge graphs by adopting new technology early. It did so by investing, over centuries, in shared meaning. Long before modern data platforms or AI, medicine aligned on what exists (ontologies), how entities are named (controlled vocabularies), how evidence is generated (observations), how data moves between systems (interoperability standards), and how alignment is enforced (through regulation, collaboration, and public funding).
This article shows that healthcare is not unique in needing these foundations, and it is no longer unique in building them. Other industries are already developing shared ontologies, vocabularies, observation standards, and exchange models in law, finance, climate science, construction, cybersecurity, and government. The difference is not feasibility, but maturity and coordination.
In the sections that follow, I walk through the key lessons other industries can take from healthcare’s experience, highlighting what healthcare got right, and pointing to concrete examples from other domains where similar approaches are already working.
Shared ontologies — agree on what exists
The healthcare industry has tons of ontologies. They have ontologies for anatomy (Uberon), genes (Gene Ontology), chemical compounds (ChEBI) and hundreds of other domains. Repositories such as BioPortal and the OBO Foundry provide access to well over a thousand biomedical ontologies. Most of these ontologies are domain ontologies – they describe the domain of healthcare.
In addition to these domain ontologies, healthcare uses cross-domain ontologies like Schema.org and QUDT (Quantities, Units, Dimensions, and Types). They use the Web Ontology Language (OWL), the Shapes Constraint Language (SHACL), and the Simple Knowledge Organization System (SKOS) to build their ontologies – all standards from the World Wide Web Consortium (W3C)–more on this later. There are also things called upper ontologies, which are used to model things at a higher level than a specific domain. Some examples of these are the Basic Formal Ontology (BFO), the Suggested Upper Merged Ontology (SUMO), and gist, a lightweight upper ontology.
Other industries can learn from healthcare’s history of codifying a shared understanding of a domain and explicitly agreeing on what exists and how those things relate. While healthcare benefited from centuries of empirical science, all industries and organizations deal with entities and rules that can be codified. Finance, law, supply chains, and even religious institutions have long relied on formalized structures to reason. Here are some examples of ontologies being successfully used in other industries:
- The European Legislation Identifier (ELI) Ontology is a strong example of a free, publicly funded ontology built using W3C standards. It provides a shared semantic model for legislation across EU member states—defining how laws, amendments, jurisdictions, and legal relationships are identified and linked. Rather than digitizing documents alone, it encodes how the legal system itself works.
- The Environment Ontology (ENVO) is a complementary example from the scientific community. ENVO is a community-led, open ontology that represents environments, ecosystems, habitats, and environmental processes. It demonstrates that shared ontologies do not require centralized authority; they can emerge from distributed expert consensus and still become widely used infrastructure.
- The Financial Industry Business Ontology (FIBO) shows how finance, like healthcare, benefits from agreeing on core concepts—entities, contracts, and instruments—so firms compete on products rather than on definitions.
- EarthPortal is like BioPortal but for Earth sciences, though at a smaller scale. It’s a home for ontologies about Earth sciences, and is largely community-driven, not publicly funded like BioPortal.
- This is a small subset — for the full list go to this app.
Treat controlled vocabularies as infrastructure, not project-specific
Healthcare advanced by treating catalogs of real-world entities as first-class infrastructure. They have controlled vocabularies for conditions and procedures (SNOMED CT), diseases (ICD 11), adverse effects (MedDRA), drugs (RxNorm), compounds (CheBI and PubChem), proteins (UniProt), and genes (NCBI Gene). There are even organizations that tie many of these together into a unified knowledge graph like the Scalable Precision Medicine Open Knowledge Engine (SPOKE), the Monarch Initiative, and Open Targets.
Other industries can do the same by building and curating lists of things they depend on (companies, industries, financial instruments, policies, parts) and publishing them as open, machine-readable datasets. Here are a few prominent examples from other industries:
- The United Nations Bibliographic Information System (UNBIS) Thesaurus is a good example of a free, publicly funded taxonomy that standardizes subjects, geographies, and institutional concepts across the UN system. It acts as a shared controlled vocabulary that enables interoperability across agencies, reports, and repositories.
- An example from finance is the Legal Entity Identifier (LEI) system. LEI provides a global, open identifier for legal entities participating in financial transactions.
- The International Financial Reporting Standards (IFRS) Foundation maintains the IFRS Accounting Taxonomy which contains elements for tagging financial statements prepared in accordance with IFRS Accounting Standards.
- AGROVOC is a multilingual controlled vocabulary maintained by the Food and Agriculture Organization (FAO) of the United Nations to promote interoperability of reports and data.
- GeoNames is an open geographic database of over 25 million place names, identifiers, and geographic features. It is widely used across industries from logistics to news media and is published using W3C standards.
Let empirical observation drive structure
Healthcare evolved through observation, experimentation, and replication. Claims about drugs must be backed by evidence and dogmatists were (eventually) overruled by empirical results. In healthcare, the Clinical Data Interchange Standards Consortium (CDISC) standardizes how clinical trial observations—measurements, outcomes, and adverse events—are recorded and evaluated, enabling cumulative, reproducible evidence. There are examples of other industries embracing a standardized approach to recording observational data:
- The Climate and Forecast Metadata Conventions (CF Conventions) standardize how observed climate variables are described across sensors and models, enabling scientific data to be shared, compared, and reused. They are developed and maintained through an open, community-driven process.
- The Industry Foundation Classes (IFC) from buildingSMART international define a shared representation of real-world structures (buildings, components, and systems) across design, construction, and operations. This allows observations about buildings to accumulate over a structure’s full lifecycle.
Standardize how data is shared, not just what it means
Healthcare didn’t stop at shared semantics and evidence standards; it also standardized interoperability. The Health Level Seven International (HL7) standards—most notably HL7 FHIR—define how clinical data such as patients, observations, medications, and encounters are exchanged between systems. Here are some examples from other industries:
- The eXtensible Business Reporting Language (XBRL) standardizes how financial statements and disclosures are reported to regulators and markets. These taxonomies are created by regulators and published through registries coordinated by XBRL International
- The National Information Exchange Model (NIEM) is a framework for building information schema by aligning on common vocabulary and design rules across domains. This allows information about people, events, and cases to move between agencies or organizations without losing meaning or legal integrity.
Use regulation to force semantic alignment
Strong regulatory pressure forced healthcare to align on definitions of terms and standards for empirical studies. The FDA reinforces this alignment by requiring conformity to standards and controlled terminologies, such as CDISC for clinical trial data and MedDRA for adverse event reporting. Other industries, like finance and aviation, are also highly regulated and have standardized ways of reporting and tracking compliance:
Notably, in healthcare, organizations like the FDA and WHO actively require the use of shared vocabularies like MedDRA, ICD, and CDISC in regulatory processes. In finance, while regulators like the SEC and FINRA enforce reporting and compliance, there is not a comparably mature, shared ecosystem of regulatory vocabularies.
Separate pre-competitive semantics from competitive advantage
Healthcare companies compete on drugs, not the definition of drugs. Agreeing on the definition of terms and best practices for sharing data does not impede competition. The Pistoia Alliance exemplifies this approach in life sciences by bringing competitors together to develop shared semantic standards and interoperability practices as pre-competitive infrastructure. Here are some examples from other industries:
- EDM Council plays a role in finance similar to the Pistoia Alliance in life sciences, bringing competing institutions together to develop shared data semantics and standards (including FIBO) as pre-competitive infrastructure.
- buildingSMART International brings together software vendors, architects, engineers, and construction firms to maintain Industry Foundation Classes (IFC). Vendors compete on tools, but agree on building and component terms and the way they are represented.
- The MITRE Corporation, the R&D organization, publishes MITRE ATT&CK, a knowledge graph of adversary tactics and techniques for decision support in cybersecurity operations. While security contractors compete on tools, they can agree on the language for describing threats and incidents.
Fund shared knowledge as a public good
Public funding has been essential for building and maintaining healthcare’s ontologies and controlled vocabularies, and it is unlikely that one organization would build them all by itself. Other industries could build consortia, foundations, and public-private partnerships to support a similar semantic infrastructure. Public funding from the National Institutes of Health (NIH) has been essential to building and sustaining core biomedical ontologies and controlled vocabularies. Other industries have also benefited from public funding:
Anchor meaning in open standards
Aligning with open standards ensures that knowledge outlives any single vendor, platform, or technology. Organizations like the World Wide Web Consortium (W3C) define foundational standards like RDF, OWL, and SHACL. By anchoring semantics in open standards rather than vendor-specific schemas, industries create knowledge that can be reused, integrated, and reasoned over for decades, even as tools and architectures evolve.
Author note: I serve as an Advisory Committee member of the World Wide Web Consortium (W3C), an unpaid role held on behalf of my employer, TopQuadrant.
Build incrementally
Knowledge graphs in healthcare have been the result of a long history of discovering new things, documenting the findings, cataloging the instances of classes, and conducting experiments. It is unlikely that an industry can build a domain knowledge graph top-down. Well-structured domain knowledge is also not something that can be done quickly, even with AI.
Conclusion
Long before modern data platforms or AI, medicine invested in shared definitions, controlled vocabularies, empirical standards, and interoperable ways of exchanging evidence. Those choices allowed knowledge to accumulate rather than fragment.
Other industries do not need to replicate healthcare’s path exactly, but they can adopt some of its principles. Agree on what exists. Treat reference data and vocabularies as shared infrastructure. Let observation and evidence drive structure. Use regulation and collaboration to enforce alignment. Fund semantics as a public good. Anchor meaning in open standards so it outlives any single vendor or system.
Healthcare didn’t succeed because it adopted AI early. It succeeded because it spent centuries externalizing meaning. Knowledge graphs don’t create that agreement—but they finally make it computable, reusable, and scalable.
About the author: Steve Hedden is the Head of Product Management at TopQuadrant, where he leads the strategy for EDG, a platform for knowledge graph and metadata management. His work focuses on bridging enterprise data governance and AI through ontologies, taxonomies, and semantic technologies. Steve writes and speaks regularly about knowledge graphs, and the evolving role of semantics in AI systems.