pyspectools.models package¶

Submodules¶

pyspectools.models.classes module¶

class pyspectools.models.classes.MoleculeDetective(weights_path=None, device='cpu', **kwargs)[source]¶

Bases: object

run_inference(specconst_obj: pyspectools.models.classes.SpecConstants, composition=None, N=1000)[source]¶

Use a pre-trained PyTorch model to perform inference, conditional on the experimental constants and the expected composition. This framework can be used to account for various forms of uncertainties, and the default behavior is to provide the minimum amount of information. For example, the composition argument can be provided as an int type representing:

[0: hydrocarbon, 1: oxygen-bearing, 2: nitrogen-bearing, 3: ON-bearing]

By default, composition is None; the case where we don’t know what the composition is, in which case we will randomly try all four compositions.

This implementation is set up to take advantage of performance in torch. Rather than repeatedly call a function a la MCMC sampling, we simply pass an entire tensor of all the samples so that torch can run without Python interaction. You can think of each row as a “minibatch”.

The constants are converted to GHz, and therefore expected as MHz.

Parameters

specconst_obj ([type]) – SpecConstants object, which will generate random samples based on the experimental uncertainties.
composition (int or None (default), optional) – The expected composition of the molecule. When composition is an integer, inference is performed conditional on the specific composition; the definitions are provided in the docstring. If None (default), we have no prior knowledge of the composition, and will randomly test all four.
N (int, optional) – Number of samples to run, by default 1000

Returns

NumPy arrays corresponding to the predicted eigenspectrum, molecular formula, and functional groups present. functional is output as log sigmoid by the model, and this function returns the exponential to get back sigmoid likelihoods.

Return type

eigenspectrum, formula, functional

class pyspectools.models.classes.MoleculeResult(eigenspectrum, formulas, functional_groups)[source]¶

Bases: object

analyze(q=0.025, 0.5, 0.975)[source]¶

Convenience function to compute some summary statistics, and make interactive plotly figures.

Parameters

q (tuple, optional) – [description], by default (0.025, 0.5, 0.975)

Returns

fig – Plotly Figure object
results – dict containing summary statistics of the formula/functional predictions.

func_encoding = ['Aliphatic', 'Allene', 'Vinyl', 'Alkyne', 'Carbonyl (General)', 'Carbonyl (α-nitrogen)', 'Carbonyl (α-carbon)', 'Aldehyde', 'Amide', 'Ketone', 'Ether', 'Amine', 'Amino acid', 'Nitrate', 'Nitrile', 'Isonitrile', 'Nitro', 'Alcohol', 'Alcohol (Carboxylic acid)', 'Enol', 'Phenol', 'Peroxide', 'Aromatic sp2 carbon']¶

class pyspectools.models.classes.SpecConstants(A: str, B: str, C: str, u_a=0.0+/-3.0, u_b=0.0+/-3.0, u_c=0.0+/-3.0, **kwargs)[source]¶

Bases: object

Class representing experimental parameters to be fed into the MoleculeDetective model. The user provides a set of constants as input, and the main purpose of this class is to help manage experimental uncertainties.

generate_samples(N: int)[source]¶

Function to generate samples of spectroscopic parameters, based on “diagonal” Gaussians. The nominal value and standard deviations of each parameter are used parameterize a Gaussian, and N random samples are drawn. In the case of the dipole moments, we take the absolute value of the samples, and delta and kappa are recalculated based on the drawn A, B, C.

TODO - Make this code look cleaner; there must be a smarter way to sample

Parameters: N (int) – Number of samples to generate.
Returns: 2D np.ndarray, where columns correspond to parameter, and rows are samples
Return type: samples

classmethod load(path)[source]¶

save(path=None)[source]¶

Save the molecule to disk using Pickle in joblib.

Parameters: path (str, optional) – Name to save the molecule to, not including the extension “.pkl”, by default None

pyspectools.models.torch_models module¶

class pyspectools.models.torch_models.GenericModel[source]¶

Bases: torch.nn.modules.module.Module

abstract compute_loss()[source]¶

get_num_parameters() → int[source]¶

Calculate the number of parameters contained within the model.

Returns: Number of trainable parameters
Return type: int

init_layers(weight_func=None, bias_func=None)[source]¶

Function that will initialize all the weights and biases of the model layers. This function uses the apply method of Module, and so will only work on layers that are contained as children.

Parameters

weight_func (nn.init function, optional) – Function to use to initialize weights, by default None which will default to nn.init.xavier_normal
bias_func (nn.init function, optional) – Function to use to initialize biases, by default None which will default to nn.init.xavier_uniform

classmethod load_weights(weights_path=None, device=None, **kwargs)[source]¶

Convenience method for loading in the weights of a model. Basically initializes the model, and wraps a torch.load with automatic cuda/cpu detection.

Parameters

weights_path (str) – String path to the trained weights of a model; typically with extension .pt
device (str) – String reference to the target device, either “cpu”, “cuda”, or a specific CUDA device (e.g. “cuda:0”). If None (default) the model will be loaded onto a GPU if available, otherwise a CPU.
are passed into the creation of the model, allowing you (kwargs) –
set different parameters. (to) –

Returns

Instance of the PyTorch model with loaded weights

Return type

model

class pyspectools.models.torch_models.VarMolDetect(eigen_length=30, latent_dim=14, nclasses=23, alpha=0.8, dropout=0.2, tracker=True)[source]¶

Bases: pyspectools.models.torch_models.GenericModel

Umbrella model that encapsulates the full set of variational models. The premise is to more or less try to do end-to-end learning, and should meet the user half-way in terms of usability. The forward method takes the spectroscopic constants and the molecular composition as separate inputs, and performs the concatenation prior to any calculation. The composition is reused by the VariationalDecoder model.

compute_loss(constants, composition, eigenspectrum, formula, functionals)[source]¶

forward(constants: torch.Tensor, composition: torch.Tensor)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pyspectools.models.torch_models.VariationalDecoder(latent_dim=14, eigen_length=30, nclasses=23, alpha=0.8, dropout=0.2, loss_func=None, param_transform=None, tracker=True)[source]¶

Bases: pyspectools.models.torch_models.GenericModel

This model uses the intermediate eigenspectrum to calculate a latent embedding that is then used to predict the molecular formula and functional groups. You can think of the first action as “re-encoding”, but the driving principle is that an eigenspectrum could map onto various structures, even when conditional on the composition.

compute_loss(X: torch.Tensor, formula: torch.Tensor, groups: torch.Tensor)[source]¶

Calculate the joint loss of this model. This corresponds to the sum of three components: a KL-divergence loss for the variational layer, a formula prediction accuracy as the MSE loss, and the BCE loss for the multilabel classification for the functional group prediction.

Parameters

X (torch.Tensor) – [description]
formula (torch.Tensor) – Length of the formula encoding, typically 4 [H,C,O,N]
groups (torch.Tensor) – Length of the functional groups encoding.

forward(X: torch.Tensor)[source]¶

Perform a forward pass of the VariationalDecoder model. This takes the concatenated input of the eigenspectrum and the one-hot composition, produces a latent embedding that is then used to predict the formula and functional group classification.

Parameters

X (torch.Tensor) – [description]

Returns

formula_output – Nx4 tensor corresponding to the number of atoms in the [H,C,O,N] positions.
functional_output – Nx23 tensor corresponding to multilabel classification, provided as log sigmoid.
mu, logvar – Latent variables of the variational layer

class pyspectools.models.torch_models.VariationalSpecDecoder(latent_dim=14, output_dim=30, alpha=0.8, dropout=0.2, optimizer=None, loss_func=None, opt_settings=None, param_transform=None, tracker=True)[source]¶

Bases: pyspectools.models.torch_models.GenericModel

Uses variational inference to capture the uncertainty with respect to Coulomb matrix eigenvalues. Instead of using dropout, this model represents uncertainty via a probabilistic latent layer.

compute_loss(X: torch.Tensor, Y: torch.Tensor)[source]¶

Calculate the loss of this model as the combined prediction error and KL-divergence from the approximate posterior.

Parameters

X (torch.Tensor) – Combined tensor of the spectroscopic constants and the one-hot encoded composition.
Y (torch.Tensor) – Target eigenspectrum

Returns

Joint loss of MSE and KL divergence

Return type

torch.Tensor

forward(X: torch.Tensor)[source]¶

Inputs for this model is a single Tensor, where each row is 12 elements long (8 constants, one-hot encoding for composition). The idea behind this is to predict the eigenspectrum conditional on the molecular composition.

Parameters: X (torch.Tensor) – Tensor containing spectroscopic constants, and one-hot encoding of the composition.
Returns: The predicted eigenspectrum, and the latent parameters mu and logvar
Return type: output, mu, logvar

pyspectools.models package¶

Submodules¶

pyspectools.models.classes module¶

pyspectools.models.torch_models module¶

Module contents¶