pyspectools.models package¶
Submodules¶
pyspectools.models.classes module¶
-
class
pyspectools.models.classes.
MoleculeDetective
(weights_path=None, device='cpu', **kwargs)[source]¶ Bases:
object
-
run_inference
(specconst_obj: pyspectools.models.classes.SpecConstants, composition=None, N=1000)[source]¶ Use a pre-trained PyTorch model to perform inference, conditional on the experimental constants and the expected composition. This framework can be used to account for various forms of uncertainties, and the default behavior is to provide the minimum amount of information. For example, the composition argument can be provided as an int type representing:
[0: hydrocarbon, 1: oxygen-bearing, 2: nitrogen-bearing, 3: ON-bearing]
By default, composition is None; the case where we don’t know what the composition is, in which case we will randomly try all four compositions.
This implementation is set up to take advantage of performance in torch. Rather than repeatedly call a function a la MCMC sampling, we simply pass an entire tensor of all the samples so that torch can run without Python interaction. You can think of each row as a “minibatch”.
The constants are converted to GHz, and therefore expected as MHz.
- Parameters
specconst_obj ([type]) – SpecConstants object, which will generate random samples based on the experimental uncertainties.
composition (int or None (default), optional) – The expected composition of the molecule. When composition is an integer, inference is performed conditional on the specific composition; the definitions are provided in the docstring. If None (default), we have no prior knowledge of the composition, and will randomly test all four.
N (int, optional) – Number of samples to run, by default 1000
- Returns
NumPy arrays corresponding to the predicted eigenspectrum, molecular formula, and functional groups present. functional is output as log sigmoid by the model, and this function returns the exponential to get back sigmoid likelihoods.
- Return type
eigenspectrum, formula, functional
-
-
class
pyspectools.models.classes.
MoleculeResult
(eigenspectrum, formulas, functional_groups)[source]¶ Bases:
object
-
analyze
(q=0.025, 0.5, 0.975)[source]¶ Convenience function to compute some summary statistics, and make interactive plotly figures.
- Parameters
q (tuple, optional) – [description], by default (0.025, 0.5, 0.975)
- Returns
fig – Plotly Figure object
results – dict containing summary statistics of the formula/functional predictions.
-
func_encoding
= ['Aliphatic', 'Allene', 'Vinyl', 'Alkyne', 'Carbonyl (General)', 'Carbonyl (α-nitrogen)', 'Carbonyl (α-carbon)', 'Aldehyde', 'Amide', 'Ketone', 'Ether', 'Amine', 'Amino acid', 'Nitrate', 'Nitrile', 'Isonitrile', 'Nitro', 'Alcohol', 'Alcohol (Carboxylic acid)', 'Enol', 'Phenol', 'Peroxide', 'Aromatic sp2 carbon']¶
-
-
class
pyspectools.models.classes.
SpecConstants
(A: str, B: str, C: str, u_a=0.0+/-3.0, u_b=0.0+/-3.0, u_c=0.0+/-3.0, **kwargs)[source]¶ Bases:
object
Class representing experimental parameters to be fed into the MoleculeDetective model. The user provides a set of constants as input, and the main purpose of this class is to help manage experimental uncertainties.
-
generate_samples
(N: int)[source]¶ Function to generate samples of spectroscopic parameters, based on “diagonal” Gaussians. The nominal value and standard deviations of each parameter are used parameterize a Gaussian, and N random samples are drawn. In the case of the dipole moments, we take the absolute value of the samples, and delta and kappa are recalculated based on the drawn A, B, C.
TODO - Make this code look cleaner; there must be a smarter way to sample
- Parameters
N (int) – Number of samples to generate.
- Returns
2D np.ndarray, where columns correspond to parameter, and rows are samples
- Return type
samples
-
pyspectools.models.torch_models module¶
-
class
pyspectools.models.torch_models.
GenericModel
[source]¶ Bases:
torch.nn.modules.module.Module
-
get_num_parameters
() → int[source]¶ Calculate the number of parameters contained within the model.
- Returns
Number of trainable parameters
- Return type
int
-
init_layers
(weight_func=None, bias_func=None)[source]¶ Function that will initialize all the weights and biases of the model layers. This function uses the apply method of Module, and so will only work on layers that are contained as children.
- Parameters
weight_func (nn.init function, optional) – Function to use to initialize weights, by default None which will default to nn.init.xavier_normal
bias_func (nn.init function, optional) – Function to use to initialize biases, by default None which will default to nn.init.xavier_uniform
-
classmethod
load_weights
(weights_path=None, device=None, **kwargs)[source]¶ Convenience method for loading in the weights of a model. Basically initializes the model, and wraps a torch.load with automatic cuda/cpu detection.
- Parameters
weights_path (str) – String path to the trained weights of a model; typically with extension .pt
device (str) – String reference to the target device, either “cpu”, “cuda”, or a specific CUDA device (e.g. “cuda:0”). If None (default) the model will be loaded onto a GPU if available, otherwise a CPU.
are passed into the creation of the model, allowing you (kwargs) –
set different parameters. (to) –
- Returns
Instance of the PyTorch model with loaded weights
- Return type
model
-
-
class
pyspectools.models.torch_models.
VarMolDetect
(eigen_length=30, latent_dim=14, nclasses=23, alpha=0.8, dropout=0.2, tracker=True)[source]¶ Bases:
pyspectools.models.torch_models.GenericModel
Umbrella model that encapsulates the full set of variational models. The premise is to more or less try to do end-to-end learning, and should meet the user half-way in terms of usability. The forward method takes the spectroscopic constants and the molecular composition as separate inputs, and performs the concatenation prior to any calculation. The composition is reused by the VariationalDecoder model.
-
forward
(constants: torch.Tensor, composition: torch.Tensor)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
pyspectools.models.torch_models.
VariationalDecoder
(latent_dim=14, eigen_length=30, nclasses=23, alpha=0.8, dropout=0.2, loss_func=None, param_transform=None, tracker=True)[source]¶ Bases:
pyspectools.models.torch_models.GenericModel
This model uses the intermediate eigenspectrum to calculate a latent embedding that is then used to predict the molecular formula and functional groups. You can think of the first action as “re-encoding”, but the driving principle is that an eigenspectrum could map onto various structures, even when conditional on the composition.
-
compute_loss
(X: torch.Tensor, formula: torch.Tensor, groups: torch.Tensor)[source]¶ Calculate the joint loss of this model. This corresponds to the sum of three components: a KL-divergence loss for the variational layer, a formula prediction accuracy as the MSE loss, and the BCE loss for the multilabel classification for the functional group prediction.
- Parameters
X (torch.Tensor) – [description]
formula (torch.Tensor) – Length of the formula encoding, typically 4 [H,C,O,N]
groups (torch.Tensor) – Length of the functional groups encoding.
-
forward
(X: torch.Tensor)[source]¶ Perform a forward pass of the VariationalDecoder model. This takes the concatenated input of the eigenspectrum and the one-hot composition, produces a latent embedding that is then used to predict the formula and functional group classification.
- Parameters
X (torch.Tensor) – [description]
- Returns
formula_output – Nx4 tensor corresponding to the number of atoms in the [H,C,O,N] positions.
functional_output – Nx23 tensor corresponding to multilabel classification, provided as log sigmoid.
mu, logvar – Latent variables of the variational layer
-
-
class
pyspectools.models.torch_models.
VariationalSpecDecoder
(latent_dim=14, output_dim=30, alpha=0.8, dropout=0.2, optimizer=None, loss_func=None, opt_settings=None, param_transform=None, tracker=True)[source]¶ Bases:
pyspectools.models.torch_models.GenericModel
Uses variational inference to capture the uncertainty with respect to Coulomb matrix eigenvalues. Instead of using dropout, this model represents uncertainty via a probabilistic latent layer.
-
compute_loss
(X: torch.Tensor, Y: torch.Tensor)[source]¶ Calculate the loss of this model as the combined prediction error and KL-divergence from the approximate posterior.
- Parameters
X (torch.Tensor) – Combined tensor of the spectroscopic constants and the one-hot encoded composition.
Y (torch.Tensor) – Target eigenspectrum
- Returns
Joint loss of MSE and KL divergence
- Return type
torch.Tensor
-
forward
(X: torch.Tensor)[source]¶ Inputs for this model is a single Tensor, where each row is 12 elements long (8 constants, one-hot encoding for composition). The idea behind this is to predict the eigenspectrum conditional on the molecular composition.
- Parameters
X (torch.Tensor) – Tensor containing spectroscopic constants, and one-hot encoding of the composition.
- Returns
The predicted eigenspectrum, and the latent parameters mu and logvar
- Return type
output, mu, logvar
-