Yuhui Hong

Yuhui Hong

Ph.D. candidate in Computer Science
Indiana University Bloomington
Advised by Professor Haixu Tang

We "see" small molecules through analytical instruments (e.g., LC-MS, GC-MS, etc.) and analyze them using computational methods. Delighting the "dark matter" of chemical space—the vast number of unknown compounds—remains a significant challenge in the field. During my PhD, I explored small molecule identification through two approaches: (1) predicting LC-MS/MS and molecular properties from 3D conformations as a supplementary library for reference used in searching, and (2) predicting chemical formulas directly from LC-MS/MS, which goes beyond traditional database-dependent approaches with an understanding of more complex patterns in spectra. My ultimate aim is to design reliable computational methods to tackle real-world problems and enhance scientific research.

Prior to joining Indiana University Bloomington, I received my B.S. in Computer Science from Xidian University, China, in 2019, and subsequently worked as a research assistant at Xi'an Jiaotong University from 2019 to 2020 under the guidance of Professor Yaochen Li.

Please feel free to get in touch!


News 📰

- [05/09/2025] Recipient of the the Luddy Outstanding Research Award.

- [03/04/2025] Our work, A Task-Specific Transfer Learning Approach to Enhancing Small Molecule Retention Time Prediction with Limited Data, has been selected for an oral presentation at ASMS 2025.
See you in Baltimore, MD, on June 3, 2025!


Selected Publications ✨

Books, Patents, and Survey Papers

anchem
Machine Learning in Small-Molecule Mass Spectrometry
Yuhui Hong, Yuzhen Ye, & Haixu Tang (2025).
Annual Review of Analytical Chemistry, 18 (2025). [Paper]
This review highlights how machine learning is transforming small molecule mass spectrometry by: (a) predicting MS/MS spectra and properties to expand reference libraries, (b) enhancing spectral matching with automated pattern extraction, and (c) directly predicting molecular structures from MS/MS spectra when reference data is unavailable.
Neural Networks for Chemists
Qingyang Xiao, Kaiyuan Liu, Yuhui Hong & Haixu Tang (2024). American Chemical Society. [Primer]
This primer introduces the basics of neural networks, guiding students, researchers, and professionals to harness their potential. It covers foundational concepts, fully connected networks, advanced architectures, and case studies, illustrating their impact on fields like chemistry, healthcare, and beyond.
Method of predicting ms/ms spectra and properties of chemical compounds
Haixu Tang, Yuhui Hong, & Sujun Li (2023). US Patent No. WO2023239720A1.

Preprints

fiddle
FIDDLE: a Deep Learning Method for Chemical Formulas Prediction from Tandem Mass Spectra
Yuhui Hong, Sujun Li, Yuzhen Ye, & Haixu Tang (2024).
bioRxiv 2024.11.25.625316. [Preprint] [Code] [PyPI package - msfiddle]
FIDDLE (Formula IDentification by Deep LEarning) is introduced as a deep learning-based method for identifying chemical formulas from MS/MS data. It is trained on over 38,000 molecules and 1 million MS/MS spectra collected under various conditions, including collision energy and precursor types, using Quadrupole Time-of-Flight (QTOF) and Orbitrap instruments.

Peer-reviewed Articles

ac.3c04028
Enhanced Structure-Based Prediction of Chiral Stationary Phases for Chromatographic Enantioseparation from 3D Molecular Conformations
Yuhui Hong, Christopher J Welch, Patrick Piras, & Haixu Tang (2024).
Analytical Chemistry, 96(6), 2351-2359. [Paper] [Code]
3DMolCSP leverages a 3D molecular conformation representation algorithm, alongside a dataset of over 300k enantioseparation records. This approach significantly improves enantioselectivity predictions, enabling more efficient and informed decisions in chiral chromatography.
bioinfo.btad354f1
3DMolMS: Prediction of Tandem Mass Spectra from Three Dimensional Molecular Conformations
Yuhui Hong, Sujun Li, Christopher J Welch, Shane Tichy, Yuzhen Ye, & Haixu Tang (2023).
Bioinformatics, btad354. [Paper] [Code] [Documentation] [PyPI package - molnetpack] [Workflow on Konia]
3DMolMS is a deep neural network model that predicts MS/MS spectra from 3D conformations. The learned molecular representation also enhances predictions of chemical properties, such as elution time and collisional cross section, aiding compound identification.

Presentations and Talks 💡

Teaching 👩🏽‍🏫

Professional Services 🙌