Bergerhoff, G., Hundt, R., Sievers, R. & Brown, I. D. The inorganic crystal structure data base. J. Chem. Inf. Comput. Sci. 23, 66–69 (1983).
Belsky, A., Hellenbrandt, M., Karen, V. L. & Luksch, P. New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. Acta Crystallogr. B 58, 364–369 (2002).
Jain, A. et al. Commentary: the Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the Open Quantum Materials Database (OQMD). JOM (1989) 65, 1501–1509 (2013).
Curtarolo, S. et al. AFLOW: an automatic framework for high-throughput materials discovery. Comput. Mater. Sci. 58, 218–226 (2012).
Draxl, C. & Scheffler, M. NOMAD: the FAIR concept for big data-driven materials science. MRS Bull. 43, 676–682 (2018).
Schmidt, J. et al. Improving machine-learning models in materials science through large datasets. Mater. Today Phys. 48, 101560 (2024).
Davies, D. W. et al. Computational screening of all stoichiometric inorganic materials. Chem 1, 617–627 (2016).
Riebesell, J. et al. Discovery of high-performance dielectric materials with machine-learning-guided search. Cell Rep. Phys. Sci. 5, 102241 (2024).
Borg, C. K. H. et al. Quantifying the performance of machine learning models in materials discovery. Digit. Discov. 2, 327–338 (2023).
Goodall, R. E. A., Parackal, A. S., Faber, F. A., Armiento, R. & Lee, A. A. Rapid discovery of stable materials by coordinate-free coarse graining. Sci. Adv. 8, eabn4117 (2022).
Zhu, A., Batzner, S., Musaelian, A. & Kozinsky, B. Fast uncertainty estimates in deep learning interatomic potentials. J. Chem. Phys. 158, 164111 (2023).
Depeweg, S., Hernandez-Lobato, J.-M., Doshi-Velez, F. & Udluft, S. Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In Proc. 35th International Conference on Machine Learning, 1184–1193 (PMLR, 2018).
Bartel, C. J. et al. A critical examination of compound stability predictions from machine-learned formation energies. NPJ Comput. Mater. 6, 1–11 (2020).
Montanari, B., Basak, S. & Elena, A. Goldilocks convergence tools and best practices for numerical approximations in density functional theory calculations (EDC, 2024); https://ukerc.rl.ac.uk/cgi-bin/ercri4.pl?GChoose=gdets&GRN=EP/Z530657/1
Griffin, S. M. Computational needs of quantum mechanical calculations of materials for high-energy physics. Preprint at https://arxiv.org/abs/2205.10699 (2022).
Austin, B. et al. NERSC 2018 Workload Analysis (Data from 2018) (2022); https://portal.nersc.gov/project/m888/nersc10/workload/N10_Workload_Analysis.latest.pdf
Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Bartók, A. P., Kermode, J., Bernstein, N. & Csányi, G. Machine learning a general-purpose interatomic potential for silicon. Phys. Rev. X 8, 041048 (2018).
Deringer, V. L., Caro, M. A. & Csányi, G. A general-purpose machine-learning force field for bulk and nanostructured phosphorus. Nat. Commun. 11, 5461 (2020).
Zuo, Y. et al. Accelerating materials discovery with Bayesian optimization and graph deep learning. Mater. Today 51, 126–135 (2021).
Chen, C. & Ong, S. P. A universal graph deep learning interatomic potential for the periodic table. Nat. Comput. Sci. 2, 718–728 (2022).
Deng, B. et al. CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nat. Mach. Intell. 5, 1031–1041 (2023).
Batatia, I., Kovács, D. P., Simm, G. N. C., Ortner, C. & Csányi, G. MACE: higher order equivariant message passing neural networks for fast and accurate force fields. Preprint at http://arxiv.org/abs/2206.07697 (2023).
Riebesell, J. Towards Machine Learning Foundation Models for Materials Chemistry. PhD Thesis, Univ. of Cambridge (2024); www.repository.cam.ac.uk/handle/1810/375689
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
Kpanou, R., Osseni, M. A., Tossou, P., Laviolette, F. & Corbeil, J. On the robustness of generalization of drug-drug interaction models. BMC Bioinformatics 22, 477 (2021).
Meredig, B. et al. Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery. Mol. Syst. Des. Eng. 3, 819–825 (2018).
Cubuk, E. D., Sendek, A. D. & Reed, E. J. Screening billions of candidates for solid lithium-ion conductors: a transfer learning approach for small data. J. Chem. Phys. 150, 214701 (2019).
Zahrt, A. F., Henle, J. J. & Denmark, S. E. Cautionary guidelines for machine learning studies with combinatorial datasets. ACS Comb. Sci. 22, 586–591 (2020).
Sun, W. et al. The thermodynamic scale of inorganic crystalline metastability. Sci. Adv. 2, e1600225 (2016).
Goodall, R. E. A. & Lee, A. A. Predicting materials properties without crystal structure: deep representation learning from stoichiometry. Nat. Commun. 11, 6280 (2020).
Dunn, A., Wang, Q., Ganose, A., Dopp, D. & Jain, A. Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm. NPJ Comput. Mater. 6, 1–10 (2020).
Chanussot, L. et al. Open Catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).
Lee, K. L. K. et al. Matsciml: a broad, multi-task benchmark for solid-state materials modeling. Preprint at https://arxiv.org/abs/2309.05934 (2023).
Choudhary, K. et al. Jarvis-leaderboard: a large scale benchmark of materials design methods. NPJ Comput. Mater. 10, 93 (2024).
Tran, R. et al. The Open Catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts. ACS Catal. 13, 3066–3084 (2023).
Lan, J. et al. AdsorbML: a leap in efficiency for adsorption energy calculations using generalizable machine learning potentials. NPJ Comput. Mater. 9, 172 (2023).
Sriram, A. et al. The Open DAC 2023 dataset and challenges for sorbent discovery in direct air capture. ACS Cent. Sci. 10, 923–941 (2024).
Barroso-Luque, L. et al. Open materials 2024 (OMat24) inorganic materials dataset and models. Preprint at https://arxiv.org/abs/2410.12771 (2024).
Lilienfeld, O. A. V. & Burke, K. Retrospective on a decade of machine learning for chemical discovery. Nat. Commun. 11, 4895 (2020).
Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).
Yang, H. et al. MatterSim: a deep learning atomistic model across elements, temperatures and pressures. Preprint at https://arxiv.org/abs/2405.04967 (2024).
McDermott, M. J., Dwaraknath, S. S. & Persson, K. A. A graph-based network for predicting chemical reaction pathways in solid-state materials synthesis. Nat. Commun. 12, 3097 (2021).
Aykol, M., Montoya, J. H. & Hummelshøj, J. Rational solid-state synthesis routes for inorganic materials. J. Am. Chem. Soc. 143, 9244–9259 (2021).
Wen, M. et al. Chemical reaction networks and opportunities for machine learning. Nat. Comput. Sci. 3, 12–24 (2023).
Yuan, E. C.-Y. et al. Analytical ab initio Hessian from a deep learning potential for transition state optimization. Nat. Commun. 15, 8865 (2024).
Aykol, M., Dwaraknath, S. S., Sun, W. & Persson, K. A. Thermodynamic limit for synthesis of metastable inorganic materials. Sci. Adv. 4, eaaq0148 (2018).
Shoghi, N. et al. From molecules to materials: pre-training large generalizable models for atomic property prediction. Preprint at https://arxiv.org/abs/2310.16802 (2023).
Wang, H.-C., Botti, S. & Marques, M. A. L. Predicting stable crystalline compounds using chemical similarity. NPJ Comput. Mater. 7, 1–9 (2021).
Cheetham, A. K. & Seshadri, R. Artificial intelligence driving materials discovery? Perspective on the article: scaling deep learning for materials discovery. Chem. Mater. 36, 3490–3495 (2024).
Batatia, I. et al. A foundation model for atomistic materials chemistry. Preprint at https://arxiv.org/abs/2401.00096 (2023).
Deng, B. et al. Systematic softening in universal machine learning interatomic potentials. NPJ Comput. Mater. 11, 9 (2025).
Póta, B., Ahlawat, P., Csányi, G. & Simoncelli, M. Thermal conductivity predictions with foundation atomistic models. Preprint at https://arxiv.org/abs/2408.00755 (2024).
Fu, X. et al. Forces are not enough: benchmark and critical evaluation for machine learning force fields with molecular simulations. Transact. Mach. Learn. Res. https://openreview.net/forum?id=A8pqQipwkt (2023).
Chiang, Y. et al. MLIP arena: advancing fairness and transparency in machine learning interatomic potentials through an open and accessible benchmark platform. AI for Accelerated Materials Design – ICLR 2025 https://openreview.net/forum?id=ysKfIavYQE (2025).
Li, K., DeCost, B., Choudhary, K., Greenwood, M. & Hattrick-Simpers, J. A critical examination of robustness and generalizability of machine learning prediction of materials properties. NPJ Comput. Mater. 9, 55 (2023).
Li, K. et al. Exploiting redundancy in large materials datasets for efficient machine learning with less data. Nat. Commun. 14, 7283 (2023).
Bitzek, E., Koskinen, P., Gähler, F., Moseler, M. & Gumbsch, P. Structural relaxation made simple. Phys. Rev. Lett. 97, 170201 (2006).
Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31, 3564–3572 (2019).
Glawe, H., Sanna, A., Gross, E. K. U. & Marques, M. A. L. The optimal one dimensional periodic table: a modified Pettifor chemical scale from data mining. New J. Phys. 18, 093011 (2016).
Parackal, A. S., Goodall, R. E., Faber, F. A. & Armiento, R. Identifying crystal structures beyond known prototypes from x-ray powder diffraction spectra. Phys. Rev. Mater. 8, 103801 (2024).
Liao, Y.-L., Wood, B., Das, A. & Smidt, T. EquiformerV2: improved equivariant transformer for scaling to higher-degree representations. International Conference on Learning Representations (ICLR) https://openreview.net/forum?id=mCOBKZmrzD (2024).
Liao, Y.-L., Smidt, T., Shuaibi, M. & Das, A. Generalizing denoising to non-equilibrium structures improves equivariant force fields. Preprint at https://arxiv.org/abs/2403.09549 (2024).
Liao, Y.-L. & Smidt, T. Equiformer: equivariant graph attention transformer for 3D atomistic graphs. International Conference on Learning Representations (ICLR) https://openreview.net/forum?id=KwmPfARgOTD (2023).
Passaro, S. & Zitnick, C. L. Reducing SO(3) convolutions to SO(2) for efficient equivariant GNNs. Preprint at https://arxiv.org/abs/2302.03655 (2023).
Neumann, M. et al. Orb: a fast, scalable neural network potential. Preprint at https://arxiv.org/abs/2410.22570 (2024).
Park, Y., Kim, J., Hwang, S. & Han, S. Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations. J. Chem. Theory Comput. 20, 4857–4868 (2024).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017).
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
Thomas, N. et al. Tensor field networks: rotation- and translation-equivariant neural networks for 3D point clouds. Preprint at http://arxiv.org/abs/1802.08219 (2018).
Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B 99, 014104 (2019).
Choudhary, K. & DeCost, B. Atomistic line graph neural network for improved materials property predictions. NPJ Comput. Mater. 7, 1–8 (2021).
Choudhary, K. et al. Unified graph neural network force-field for the periodic table: solid state applications. Digit. Discov. 2, 346–355 (2023).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Gibson, J., Hire, A. & Hennig, R. G. Data-augmentation for graph neural network learning of the relaxed energies of unrelaxed structures. NPJ Comput. Mater. 8, 1–7 (2022).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc., 2017); https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Ward, L. et al. Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations. Phys. Rev. B 96, 024104 (2017).
Ward, L., Agrawal, A., Choudhary, A. & Wolverton, C. A general-purpose machine learning framework for predicting properties of inorganic materials. NPJ Comput. Mater. 2, 1–7 (2016).
Rupp, M., Tkatchenko, A., Müller, K.-R. & Lilienfeld, O. A. V. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
Schütt, K. T. et al. How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys. Rev. B 89, 205118 (2014).
Riebesell, J. & Goodall, R. Matbench discovery: WBM dataset. Figshare https://figshare.com/articles/dataset/22715158 (2023).
Riebesell, J. & Goodall, R. Mp ionic step snapshots for matbench discovery. Figshare https://figshare.com/articles/dataset/23713842 (2023).
Riebesell, J. et al. janosh/matbench-discovery: v1.3.1. Zenodo https://doi.org/10.5281/zenodo.13750664 (2024).