Line identification
Line assignment
Molecule identification
The most time consuming and ill-defined step!
McCarthy & Lee, JPCA, 2020(124) 3002
Experimentally determined spectroscopic parameters are uninformative
Databases
Machine learning
Structures encoded as Coulomb matrix eigenspectra
$M_{ij} = \begin{cases} 0.5Z^{2.4}_i & \text{for}~i = j\\ \frac{Z_iZ_j}{\vert \mathbf{R}_i - \mathbf{R}_j} & \text{for}~i \neq j\\ \end{cases}$
Ten largest eigenvalues for structurally similar species
Models trained on four subdatasets
Pure hydrocarbons
Oxygen-bearing species
Nitrogen-bearing species
Oxygen/nitrogen-bearing species
Use benzene as quantitative testing of model behaviors
The most critical step—eigenspectra encodes approximate structure and atom composition
Input gradients indicate $\kappa$ and the dipole moments are most important to the structure
Extracting identifying information from the Coulomb eigenspectrum encoding
Converted formulae are comparable to mass spectrometry!
Multiclass classification for functional group identification
Intuition from formula + functional group
Single model approach
Faster training/inferenceConstants to molecular graph mapping
End-to-end pipeline for molecule identificationUncertainty aware deep learning model for molecule identification
Experimentally determinable parameters
Fast, functional interface in PySpecTools
Rich complex mixtures provide a wealth of spectroscopic data.
The hard part is identifying completely unknown molecules!
Simple neural networks can identify aspects of the molecule from spectroscopic parameters