{"id":41603,"date":"2025-09-03T20:38:09","date_gmt":"2025-09-03T20:38:09","guid":{"rendered":"https:\/\/www.europesays.com\/ie\/41603\/"},"modified":"2025-09-03T20:38:09","modified_gmt":"2025-09-03T20:38:09","slug":"a-new-generative-ai-approach-to-predicting-chemical-reactions-mit-news","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ie\/41603\/","title":{"rendered":"A new generative AI approach to predicting chemical reactions | MIT News"},"content":{"rendered":"<p>Many attempts have been made to harness the power of new artificial intelligence and large language models (LLMs) to try to predict the outcomes of new chemical reactions. These have had limited success, in part because until now they have not been grounded in an understanding of fundamental physical principles, such as the laws of conservation of mass. Now, a team of researchers at MIT has come up with a way of incorporating these physical constraints on a reaction prediction model, and thus greatly improving the accuracy and reliability of its outputs.<\/p>\n<p>The new work was <a href=\"https:\/\/www.nature.com\/articles\/s41586-025-09426-9\" rel=\"nofollow noopener\" target=\"_blank\">reported Aug. 20 in the journal Nature<\/a>, in a paper by recent postdoc Joonyoung Joung (now an assistant professor at Kookmin University, South Korea); former software engineer Mun Hong Fong (now at Duke University); chemical engineering graduate student Nicholas Casetti; postdoc Jordan Liles; physics undergraduate student Ne Dassanayake; and senior author Connor Coley, who is the Class of 1957 Career Development Professor in the MIT departments of Chemical Engineering and Electrical Engineering and Computer Science.<\/p>\n<p>\u201cThe prediction of reaction outcomes is a very important task,\u201d Joung explains. For example, if you want to make a new drug, \u201cyou need to know how to make it. So, this requires us to know what product is likely\u201d to result from a given set of chemical inputs to a reaction. But most previous efforts to carry out such predictions look only at a set of inputs and a set of outputs, without looking at the intermediate steps or considering the constraints of ensuring that no mass is gained or lost in the process, which is not possible in actual reactions.<\/p>\n<p>Joung points out that while large language models such as ChatGPT have been very successful in many areas of research, these models do not provide a way to limit their outputs to physically realistic possibilities, such as by requiring them to adhere to conservation of mass. These models use computational \u201ctokens,\u201d which in this case represent individual atoms, but \u201cif you don\u2019t conserve the tokens, the LLM model starts to make new atoms, or deletes atoms in the reaction.\u201d Instead of being grounded in real scientific understanding, \u201cthis is kind of like alchemy,\u201d he says. While many attempts at reaction prediction only look at the final products, \u201cwe want to track all the chemicals, and how the chemicals are transformed\u201d throughout the reaction process from start to end, he says.<\/p>\n<p>In order to address the problem, the team made use of a method developed back in the 1970s by chemist Ivar Ugi, which uses a bond-electron matrix to represent the electrons in a reaction. They used this system as the basis for their new program, called FlowER (Flow matching for Electron Redistribution), which allows them to explicitly keep track of all the electrons in the reaction to ensure that none are spuriously added or deleted in the process.<\/p>\n<p>The system uses a matrix to represent the electrons in a reaction, and uses nonzero values to represent bonds or lone electron pairs and zeros to represent a lack thereof. \u201cThat helps us to conserve both atoms and electrons at the same time,\u201d says Fong. This representation, he says, was one of the key elements to including mass conservation in their prediction system.<\/p>\n<p>The system they developed is still at an early stage, Coley says. \u201cThe system as it stands is a demonstration \u2014 a proof of concept that this generative approach of flow matching is very well suited to the task of chemical reaction prediction.\u201d While the team is excited about this promising approach, he says, \u201cwe\u2019re aware that it does have specific limitations as far as the breadth of different chemistries that it\u2019s seen.\u201d Although the model was trained using data on more than a million chemical reactions, obtained from a U.S. Patent Office database, those data do not include certain metals and some kinds of catalytic reactions, he says.<\/p>\n<p>\u201cWe\u2019re incredibly excited about the fact that we can get such reliable predictions of chemical mechanisms\u201d from the existing system, he says. \u201cIt conserves mass, it conserves electrons, but we certainly acknowledge that there\u2019s a lot more expansion and robustness to work on in the coming years as well.\u201d<\/p>\n<p>But even in its present form, which is being made freely available through the online platform GitHub, \u201cwe think it will make accurate predictions and be helpful as a tool for assessing reactivity and mapping out reaction pathways,\u201d Coley says. \u201cIf we\u2019re looking toward the future of really advancing the state of the art of mechanistic understanding and helping to invent new reactions, we\u2019re not quite there. But we hope this will be a steppingstone toward that.\u201d<\/p>\n<p>\u201cIt\u2019s all open source,\u201d says Fong. \u201cThe models, the data, all of them are up there,\u201d including a previous dataset developed by Joung that exhaustively lists the mechanistic steps of known reactions. \u201cI think we are one of the pioneering groups making this dataset, and making it available open-source, and making this usable for everyone,\u201d he says.<\/p>\n<p>The FlowER model matches or outperforms existing approaches in finding standard mechanistic pathways, the team says, and makes it possible to generalize to previously unseen reaction types. They say the model could potentially be relevant for predicting reactions for medicinal chemistry, materials discovery, combustion, atmospheric chemistry, and electrochemical systems.<\/p>\n<p>In their comparisons with existing reaction prediction systems, Coley says, \u201cusing the architecture choices that we\u2019ve made, we get this massive increase in validity and conservation, and we get a matching or a little bit better accuracy in terms of performance.\u201d<\/p>\n<p>He adds that \u201cwhat\u2019s unique about our approach is that while we are using these textbook understandings of mechanisms to generate this dataset, we\u2019re anchoring the reactants and products of the overall reaction in experimentally validated data from the patent literature.\u201d\u00a0They are inferring the underlying mechanisms, he says, rather than just making them up. \u201cWe\u2019re imputing them from experimental data, and that\u2019s not something that has been done and shared at this kind of scale before.\u201d<\/p>\n<p>The next step, he says, is \u201cwe are quite interested in expanding the model\u2019s understanding of metals and catalytic cycles. We\u2019ve just scratched the surface in this first paper,\u201d and most of the reactions included so far don\u2019t include metals or catalysts, \u201cso that\u2019s a direction we\u2019re quite interested in.\u201d<\/p>\n<p>In the long term, he says, \u201ca lot of the excitement is in using this kind of system to help discover new complex reactions and help elucidate new mechanisms. I think that the long-term potential impact is big, but this is of course just a first step.\u201d<\/p>\n<p>The work was supported by the Machine Learning for Pharmaceutical Discovery and Synthesis consortium and the National Science Foundation.<\/p>\n","protected":false},"excerpt":{"rendered":"Many attempts have been made to harness the power of new artificial intelligence and large language models (LLMs)&hellip;\n","protected":false},"author":2,"featured_media":41604,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[77],"tags":[31537,31538,31546,31542,18,31539,31540,5568,19,17,31544,31535,31536,31545,31543,31541,133],"class_list":{"0":"post-41603","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-science","8":"tag-ai-in-chemistry","9":"tag-ai-in-drug-development","10":"tag-connor-coley","11":"tag-conservation-of-mass","12":"tag-eire","13":"tag-electron-flow-matching","14":"tag-flower-flow-matching-for-electron-redistribution","15":"tag-generative-ai","16":"tag-ie","17":"tag-ireland","18":"tag-joonyoung-joung","19":"tag-mit-cheme","20":"tag-mit-chemical-engineering","21":"tag-mun-hong-fong","22":"tag-open-source-reaction-prediction-system","23":"tag-reaction-prediction","24":"tag-science"},"share_on_mastodon":{"url":"","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/41603","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/comments?post=41603"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/41603\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media\/41604"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media?parent=41603"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/categories?post=41603"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/tags?post=41603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}