Reliable material databases bridge AI- and experimental-led material discovery

The evolution of materials science paradigms. Credit: Li et al.

Materials databases lie at the heart of future data-driven discovery in energy-related fields, say researchers from Tohoku University. In an article published in the journal Precision Chemistry, they have examined how different types of databases, both computational and experimental, work together to support modern artificial intelligence (AI) tools used in materials science.

The study found that materials databases are no longer just places to store information. Instead, they play a central role in determining how well AI models perform. The way data is collected, organized, and shared—known as database architecture—can directly affect whether AI systems produce reliable and useful results.

“In a library, if books are poorly labeled, have missing pages, or are difficult to access, even the most skilled reader will struggle to find accurate information,” stresses Hao Li, lead author of the paper and Distinguished Professor at Tohoku University’s Advanced Institute for Materials Research (AIMR). “In the same way, AI models depend on well-structured and carefully curated data to make sound predictions.”

Reliable material databases bridge AI- and experimental-led material discovery

Computational and integrated platform. Credit: Li et al.

Li and his team categorized computational databases into two main groups: those focusing on bulk material properties and those focusing on surfaces and interfaces. They also reviewed experimental databases that cover areas such as crystal structures, catalysis, energy storage, and materials characterization.

Further analysis revealed the growing importance of integrated platforms. These systems connect computational predictions with detailed experimental data, allowing scientists to test ideas, refine models, and validate results in a continuous cycle. This approach supports more efficient and reliable materials discovery.

Moreover, the researchers introduced a roadmap for combining databases, AI models, and experimental workflows. This includes the use of graph neural networks, machine learning interatomic potentials, and large language model-based AI agents to accelerate the discovery process while maintaining scientific rigor.

However, the researchers identified several challenges that must be addressed. These include the need for standardized data practices aligned with FAIR principles (findable, accessible, interoperable, reusable), better tracking of data origins, and improved reporting of negative results, which are often missing but are important for reducing bias.

Reliable material databases bridge AI- and experimental-led material discovery

Database-to-model-to-experiment roadmap for domain models and AI Agents. Credit: Li et al.

“Materials databases are the foundation of trustworthy AI in science,” adds Li. “If we want AI to guide discovery in a reliable way, we must first ensure that the data it learns from is complete, transparent, and well-structured. Without reliable data, AI-led discovery will itself become unreliable.”

Looking ahead, the team plans to improve database quality and connectivity across fragmented data sources. They also aim to develop new AI systems that can learn from multiple types of data simultaneously and work alongside experiments and human researchers. These efforts are expected to support more dependable and efficient discovery of materials for energy, sustainability, and everyday applications.

More information

Yutian Zhuang et al, Materials Databases: Foundations of Modern Digital Materials, Precision Chemistry (2026). DOI: 10.1021/prechem.5c00449

Provided by
Tohoku University

Citation:
Bridging AI- and experimental-led materials discovery with better database architecture (2026, April 9)
retrieved 10 April 2026
from https://phys.org/news/2026-04-bridging-ai-experimental-materials-discovery.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.