Research
A task-specific transfer learning approach to enhancing small molecule retention time prediction with limited data
bioRxiv 2025.06.26.661631.
TSTL (Task-Specific Transfer Learning) is introduced as a training strategy for predicting retention times in various LC systems with limited training data. Evaluated across 6 benchmark datasets from different LC systems using 5 deep neural network architectures, TSTL achieved significant improvements in prediction accuracy, increasing average R² from 0.587 to 0.825 with superior data efficiency.
Confounder-free predictive models for microbiome-based host phenotype prediction
bioRxiv, 2025.01.29.635502.
Confounding factors like medications can severely bias microbiome-based disease predictions, leading to spurious associations. This study developed confounder-free models using adversarial optimization to remove biases while preserving true phenotype-microbiome associations. Tested on type 2 diabetes data with metformin as confounder, both FNN_CF and MicroKPNN_CF outperformed conventional approaches by identifying genuine disease markers.
FIDDLE: a deep learning method for chemical formulas prediction from tandem mass spectra
Nature Communications, 16(1), 11102.
FIDDLE (Formula IDentification by Deep LEarning) is introduced as a deep learning-based method for identifying chemical formulas from MS/MS data. It is trained on over 38,000 molecules and 1 million MS/MS spectra collected under various conditions, including collision energy and precursor types, using Quadrupole Time-of-Flight (QTOF) and Orbitrap instruments.
Machine learning in small-molecule mass spectrometry
Annual Review of Analytical Chemistry, 18.
Small-molecule mass spectrometry can only identify compounds already in reference libraries, leaving billions of molecules uncharacterized. Machine learning is changing this by: (1) predicting spectra and properties to expand virtual libraries, (2) automating spectral matching, and (3) enabling direct structure prediction from spectra. This review examines the deep learning methods driving this shift from library matching to de novo prediction, finally enabling identification of the metabolome's dark matter.
Method of predicting MS/MS spectra and properties of chemical compounds
US Patent, US20250356958A1.
Methods and systems for predicting molecular properties from 3D molecular conformers. The method generates a 3D molecular input point set from compound information, convolutes it through stacked layers to encode the chemical compound, and produces a report of predicted properties such as MS/MS spectra.
Koina: democratizing machine learning for proteomics research
Nature Communications, 16(1), 9933.
Koina is a user-friendly platform that enables proteomics researchers to apply machine learning without coding expertise. It offers pre-configured workflows for common tasks like tandem mass spectra, retention time and collisional cross section prediction, along with customizable options for advanced users.
Enhanced structure-based prediction of chiral stationary phases for chromatographic enantioseparation from 3D molecular conformations
Analytical Chemistry, 96(6), 2351–2359.
3DMolCSP leverages a 3D molecular conformation representation algorithm, alongside a dataset of over 300k enantioseparation records. This approach significantly improves enantioselectivity predictions, enabling more efficient and informed decisions in chiral chromatography.
Multitask knowledge-primed neural network for predicting missing metadata and host phenotype based on human microbiome
Bioinformatics Advances, vbae203.
Metadata like age and gender are often missing in microbiome studies but crucial for accurate disease prediction. MicroKPNN-MT addresses this by either using available metadata as input or predicting it from microbiome profiles. Tested across 25 diseases, the model showed that incorporating real or predicted metadata improves both prediction accuracy and generalizability.
3DMolMS: prediction of tandem mass spectra from three dimensional molecular conformations
Bioinformatics, btad354.
3DMolMS is a deep neural network model that predicts MS/MS spectra from 3D conformations. The learned molecular representation also enhances predictions of chemical properties, such as elution time and collisional cross section, aiding compound identification.