Scientists have made a major breakthrough in identifying unknown natural molecules by harnessing the power of artificial intelligence. A newly developed machine learning model called DreaMS dramatically speeds up the analysis of mysterious molecular substances, a study recently published in Nature Biotechnology reveals.
The research team includes Dr. Tomáš Pluskal from IOCB Prague, recipient of this year’s Neuron Award for promising young scientists, along with his student Roman Bushuiev and collaborators Dr. Josef Šivic and Anton Bushuiev from the Czech Institute of Informatics, Robotics and Cybernetics at the Czech Technical University (CIIRC CTU).
The natural world is teeming with countless chemical compounds, most of which remain undiscovered. Understanding these molecules holds the promise of breakthroughs in drug discovery, development of eco-friendly pesticides, a deeper insight into biological systems, and even the search for life beyond Earth.
Each molecule leaves behind a unique “fingerprint” known as a mass spectrum, which can be recorded using a technique called mass spectrometry. Despite the vast amounts of data generated by this method, decoding these complex spectra to reveal exact molecular structures is an exceptionally challenging task. Typically, the data appears as massive tables of numbers lacking immediate meaning.
To tackle this challenge, the team applied AI techniques inspired by large language models like ChatGPT. While ChatGPT learns to interpret language patterns without understanding words’ meanings upfront, the DreaMS model similarly learns to recognize molecular structures hidden within mass spectra, despite having no prior chemical knowledge. Dr. Šivic explains, “DreaMS uses self-supervised machine learning to extract structural insights from millions of examples.”
The DreaMS model was trained on tens of millions of mass spectra collected from a wide range of sources—including plants, microbes, food, tissues, and soil samples. This extensive training enables the model to identify subtle chemical similarities between spectra that initially appear unrelated, says Dr. Pluskal.
The outcome is an interconnected network of mass spectra, dubbed the DreaMS Atlas, which functions like an “internet of spectra.” Users can navigate this vast chemical landscape, explore connections, and pose new questions—for example, investigating commonalities between pesticides, food, and human skin. Notably, DreaMS revealed unexpected chemical links between these areas, suggesting potential connections between certain pesticides and autoimmune conditions like psoriasis.
Beyond mapping relationships, DreaMS can perform practical analyses such as estimating the number of specific molecular fragments or detecting the presence of particular elements. Roman Bushuiev notes their surprise when the model learned to reliably identify fluorine—a challenging task given fluorine’s presence in about one-third of drugs and agrochemicals. After pretraining on millions of spectra, fine-tuning with just a few thousand fluorine-containing examples enabled the model to detect fluorine consistently.
The team is now working towards the next milestone: enabling DreaMS to predict complete molecular structures from spectra. If successful, this advancement could transform our understanding of chemical diversity not only on Earth but potentially throughout the universe.
Source:https://phys.org/news/2025-05-unknown-molecules-ai.html
This is non-financial/medical advice and made using AI so could be wrong.