Cooperation brings together machine learning and expert human-curated data with unprecedented results in synthesis reaction planning
Stuttgart, Zurich – The cooperation between IBM Research Europe and Thieme Chemistry, which was announced this summer, builds on the synergies between high-quality data (Science of Synthesis and Synfacts by Thieme) and state-of-the-art machine learning models for organic chemistry synthesis predictions (RXN for Chemistry by IBM) to create an unprecedented user experience. RXN For Chemistry, the cloud platform using artificial intelligence (AI) has recently been trained with highest-quality, human-curated datasets from Thieme’s Science of Synthesis and Synfacts. IBM Research Europe and Thieme Chemistry now announce the first results of their cooperation, which were evaluated by seven eminent synthetic chemistry experts and their research groups from China, Germany, Switzerland, New Zealand, and the USA.
Organic compounds can react with each other in hundreds of thousands different ways. Experiential knowledge is key for organic chemists to avoid spending hours and hours in the laboratory with countless trials and errors. To improve synthesis planning, IBM Research and Thieme Chemistry have combined the expert human-curated datasets from Thieme’s full-text resource for methods in synthetic organic chemistry, Science of Synthesis, and the reviewed content from the journal Synfacts with the artificial intelligence model called Molecular Transformer in RXN for Chemistry by IBM.
The Molecular Transformer, a neural machine translation model, was created to reliably predict the outcome of chemical reactions and was later enhanced to include retrosynthetic analysis – i.e. to first determine the chemicals needed to create a given target molecule. The model has proven to be very successful at learning the information of chemical reactivity present in datasets of chemical reactions. It is, however, limited to the content and correctness of these datasets.
Increased prediction accuracy
Science of Synthesis and Synfacts cover a wide area of reaction space. Typically, models trained on commercially available patent datasets perform poorly on many such reactions. Science of Synthesis and Synfacts have a higher quality of chemical records, reflected by a larger percentage of usable records. This consistency in Thieme’s dataset facilitates the learning process of the AI models, resulting in more consistent predictions: Results show that Thieme-trained models on the RXN for Chemistry platform increase prediction accuracy by a factor of three for forward predictions, and a factor of nine for retrosynthesis.
The collaborative work between Thieme and IBM Research Europe shows the impact high-quality chemical reaction data can have on future AI chemical synthesis tools. Integrating high-quality, curated data from Science of Synthesis and Synfacts provides a unique opportunity to boost the performance of RXN for chemistry to unprecedented levels as it unleashes the entire knowledge contained in hundreds of thousands of chemical reaction records.
Insightful feedback from synthetic chemistry experts
Seven highly-renowned organic synthesis experts and their groups from around the world agreed to evaluate the retrained models. The experts will continue to provide insightful feedback to IBM Research Europe and Thieme during this collaboration, enabling improvements to the models and their usage, as well as creating a unique forum for exchange between machine learning experts and the synthetic organic chemistry community:
“This innovative IBM/Thieme Chemistry platform provides an efficient tool for synthetic chemistry researchers to provide validation for their own retrosynthetic plans whilst also being presented with alternative solutions. It enables a rigorous assessment for the retrosynthetic design phase of a given synthesis which no doubt will pay dividends when the selected synthetic plan is implemented.”
Prof. Dame Margaret Brimble (University of Auckland, New Zealand)
“A sustainable future for synthesis will include minimizing the number of unproductive strategies that are pursued by running only those reactions that lead to a productive end. This is only possible through the marrying of computer designed and human designed efforts, which makes this collaboration with IBM and Thieme Chemistry exciting."
Prof. Richmond Sarpong (University of California, Berkeley, USA)
Also involved in testing the retrained models were Prof. Alois Fürstner (MPI Mülheim, Germany), Prof. Karl Gademann and Prof. Cristina Nevado (University of Zurich, Switzerland), Prof. Ang Li (Shanghai Institute of Organic Chemistry, China), Prof. Dirk Trauner (New York University, USA) and their research groups.
On December 1st, 2021, IBM Research Europe and Thieme Chemistry will be holding a free Web seminar, where the outcome of their collaborations will be outlined. The teams will compare the performance of language models trained on the highest-quality commercially available datasets (Science of Synthesis and Synfacts) to that of publicly available patent reaction records, with a specific focus on retrosynthetic and chemical prediction tasks. If you are interested to participate, please register here: Web seminar “Powering Molecular Transformers with High Quality Data”
On November 24th, 2021, IBM Research Europe will be hosting a press event in Zurich on how “Quantum and AI shape the Future”. If you would like to attend, please register here: Press Day 2021 | IBM Research Europe - Zurich
About IBM Research
For more than seven decades, IBM Research has defined the future of information technology with more than 3,000 researchers in 16 locations across five continents. Scientists from IBM Research have produced six Nobel Laureates, 10 U.S. National Medals of Technology, five U.S. National Medals of Science, six Turing Awards, 19 inductees in the National Academy of Sciences and 20 inductees into the U.S. National Inventors Hall of Fame. IBM Research has been developing data-driven chemistry solutions based on language models for over four years. In 2018, IBM launched RXN for Chemistry: The cloud platform uses an artificial intelligence model called Molecular Transformer which applies neural machine translation models to predict the outcome of a chemical reaction and thus, improve synthesis planning in organic chemistry.
Thieme is a leading supplier of information and services contributing to the improvement of healthcare and health. Employing more than 1,000 staff, the family-owned company develops products and services in digital and other media for the medical and chemistry sectors. Operating internationally with offices in 11 cities worldwide, the Thieme Group works closely with a strong network of experts and partners. The products and services are based on the high-quality content of Thieme’s 200 journals and 4,400 books. With solutions for professionals, Thieme supports relevant information processes in research, education, and patient care. Medical students, physicians, nurses, allied health specialists, hospitals, health insurance companies and others interested in health and healthcare are at focus of Thieme’s activities. The mission of the Thieme Group is to provide these markets with precisely the information, services, and products they need in their specific work situation and career. Providing top-quality services that are highly relevant to specific audiences, Thieme contributes to better healthcare and healthier lives.
Georg Thieme Verlag KG
A Thieme Group company
Ruedigerstrasse 14, 70469 Stuttgart, Germany
Tel +49 711 8931-330161
Fax +49 (0)711 8931-167
www.thieme-chemistry.com | https://www.facebook.com/thiemechemistry | https://twitter.com/thiemechemistry
Domicile and Commercial Register: Stuttgart, HRA 3499