Spectra Module¶
The module pyspectools.spectra contains most of the high-level interface one needs to analyze broadband spectra. There are two submodules, analysis and assignment, the former contains lower-level functionality for performing certain analysis routines (e.g. peak finding), while the latter contains classes that implements most of the user interaction.
Submodules¶
pyspectools.spectra.analysis module¶
-
pyspectools.spectra.analysis.
average_spectra
(*arrays, **options) → numpy.ndarray[source]¶ Averages multiple spectra together, with a few options available to the user. User provides a iterable of arrays, where the frequency axis of each of the arrays are the same, and the length of the arrays are also the same.
Options include performing a noise-weighted average, where the spectra are averaged based on their inverse of their average baseline determined by an ALS fit. The lower the average noise, the larger weighting given to the spectrum.
- Parameters
- np.ndarray (arrays) – Iterable of NumPy arrays corresponding to the intensity
- bool (weighted) – If True, the averaging is done in the time domain, providing the input spectra are frequency domain. Defaults to False.
- bool – If True, weights the averaging by the average noise in each spectrum. Defaults to True.
- Returns
Averaged intensities.
- Return type
np.ndarray
- Raises
ValueError – Error is raised if fewer than two spectra are given.
-
pyspectools.spectra.analysis.
blank_spectrum
(spectrum_df: pandas.core.frame.DataFrame, frequencies: numpy.ndarray, noise=0.0, noise_std=0.05, freq_col='Frequency', int_col='Intensity', window=1.0, df=True)[source]¶ Function to blank the peaks from a spectrum. Takes a iterable of frequencies, and generates an array of Gaussian noise corresponding to the average noise floor and standard deviation.
- Parameters
- pandas DataFrame (spectrum_df) – Pandas DataFrame containing the spectral data
- iterable of floats (frequencies) – An iterable containing the center frequencies to blank
- float (window) – Average noise value for the spectrum. Typically measured by choosing a region void of spectral lines.
- float – Standard deviation for the spectrum noise.
- str (int_col) – Name of the column in spectrum_df to use for the frequency axis
- str – Name of the column in spectrum_df to use for the intensity axis
- float – Value to use for the range to blank. This region blanked corresponds to frequency+/-window.
- bool (df) – If True, returns a copy of the Pandas Dataframe with the blanked intensity. If False, returns a numpy 1D array corresponding to the blanked intensity.
- Returns
If df is True, Pandas DataFrame with the intensity regions blanked. If df is False, numpy 1D array
- Return type
new_spec - pandas DataFrame or numpy 1D array
-
pyspectools.spectra.analysis.
bokeh_create_experiment_comparison
(experiments, thres_prox=0.2, index=0, filepath=None, **kwargs)[source]¶ Function to create a plot comparing multiple experiments. This is a high level function that wraps the correlate_experiments function, and provides a visual and interactive view of the spectra output from this function using Bokeh.
- Parameters
experiments –
thres_prox –
index –
filepath –
kwargs –
-
pyspectools.spectra.analysis.
brute_harmonic_search
(frequencies: numpy.ndarray, maxJ=10, dev_thres=5.0, prefilter=False)[source]¶ Function that will search for possible harmonic candidates in a list of frequencies. Wraps the lower level function.
Generates every possible 4 membered combination of the frequencies, and makes a first pass filtering out unreasonable combinations.
frequencies - iterable containing floats of frequencies (ulines) maxJ - maximum value of J considered for quantum numbers dev_thres - standard deviation threshold for filtering unlikely
combinations of frequencies
- prefilter - bool dictating whether or not the frequency lists
are prescreened by standard deviation. This potentially biases away from missing transitions!
- results_df - pandas dataframe containing RMS information and fitted
constants
fit_results - list containing all of ModelResult objects
-
pyspectools.spectra.analysis.
calc_line_weighting
(frequency: float, catalog_df: pandas.core.frame.DataFrame, prox=5e-05, abs=True, freq_col='Frequency', int_col='Intensity')[source]¶ Function for calculating the weighting factor for determining the likely hood of an assignment. The weighting factor is determined by the proximity of the catalog frequency to the observed frequency, as well as the theoretical intensity if it is available. :param frequency: Observed frequency in MHz :type frequency: float :param catalog_df: Pandas dataframe containing the catalog data entries :type catalog_df: dataframe :param prox: Frequency proximity threshold :type prox: float, optional :param abs: Specifies whether argument prox is taken as the absolute value :type abs: bool
- Returns
None – If nothing matches the frequency, returns None.
dataframe – If matches are found, calculate the weights and return the candidates in a dataframe.
-
pyspectools.spectra.analysis.
cluster_AP_analysis
(progression_df: pandas.core.frame.DataFrame, sil_calc=False, refit=False, **kwargs)[source]¶ Wrapper for the AffinityPropagation cluster method from scikit-learn.
The dataframe provided will also receive new columns: Cluster index, and Silhouette. The latter corresponds to how likely a sample is sandwiched between clusters (0), how squarely it belongs in the assigned cluster (+1), or does not belong (-1). The cluster index corresponds to which cluster the sample belongs to.
- progression_df - pandas dataframe taken from the result of progression
fits
- sil_calc - bool indicating whether silhouettes are calculated
after the AP model is fit
data - dict containing clustered frequencies and associated fits ap_obj - AffinityPropagation object containing all the information
as attributes.
-
pyspectools.spectra.analysis.
copy_assignments
(A, B, corr_mat)[source]¶ Function to copy assignments from experiment B over to experiment A. The correlation matrix argument requires the output from the correlate_experiments function.
- Parameters
B (A,) – AssignmentSession objects, where the assignments from B are copied into A
corr_mat (2D array) – 2D array mask with length A x B
-
pyspectools.spectra.analysis.
correlate_experiments
(experiments, thres_prox=0.2, index=0)[source]¶ Function to find correlations between experiments, looking for common peaks detected in every provided experiment. This function uses by the first experiment as the base for comparison by default. Coincidences are searched for between this base and the other provided experiment, and ultimately combined to determine the common peaks.
A copy of the base experiment is returned, along with a dictionary with frequencies of correlations between a given experiment and the base.
- Parameters
experiments (tuple-like) – Iterable list/tuple of AssignmentSession objects.
thres_prox (float, optional) – Proximity in frequency units for determining if peaks are the same. If thres-abs is False, this value is treated as a percentage of the center frequency.
index (int, optional) – Index for the experiment to use as a base for comparisons.
- Returns
base_exp (AssignmentSession object) – A deep copy of the first experiment, with the updated spectra.
return_dict (dict) – Dictionary where keys correspond to the experiment number and values are 1D arrays of frequencies that are coincident
-
pyspectools.spectra.analysis.
create_cluster_tests
(cluster_dict: Dict[Any, Any], shots=25, dipole=1.0, min_dist=500.0, **kwargs)[source]¶ Take the output of the cluster AP analysis, and generate the FTB batch files for a targeted DR search.
- Parameters
cluster_dict (dict) – Cluster dictionary with keys corresponding to the cluster number, and values are subdictionaries holding the frequencies associated with the cluster.
shots (int, optional) – Number of integration counts
dipole (float, optional) – Approximate dipole moment to target
min_dist (float, optional) – Minimum frequency difference between the cavity and DR frequencies.
kwargs – Additional kwargs are passed to the ftb line generation.
-
pyspectools.spectra.analysis.
cross_correlate
(a: numpy.ndarray, b: numpy.ndarray, lags=None)[source]¶ Cross-correlate two arrays a and b that are of equal length by lagging b with respect to a. Uses np.roll to shift b by values of lag, and appropriately zeros out “out of bounds” values.
- Parameters
b (a,) – Arrays containing the values to cross-correlate. Must be the same length.
lags ([type], optional) – [description], by default None
-
pyspectools.spectra.analysis.
detect_artifacts
(frequencies: numpy.ndarray, tol=0.002)[source]¶ Quick one-liner function to perform a very rudimentary test for RFI. This method relies on the assumption that any frequency that is suspiciously close to an exact number (e.g. 16250.0000) is very likely an artifact.
The function will calculate the difference between each frequency and its nearest whole number, and return frequencies that are within a specified tolerance.
- Parameters
frequencies (NumPy 1D array) – Array of frequencies to check for artifacts.
tol (float, optional) – Maximum tolerance to be used to check whether frequency is close enough to its rounded value, by default 2e-3
- Returns
Returns the frequencies that match the specified criteria.
- Return type
NumPy 1D array
-
pyspectools.spectra.analysis.
filter_spectrum
(intensity: float, window='hanning', sigma=0.5)[source]¶ Apply a specified window function to a signal. The window functions are taken from the signal.windows module of SciPy, so check what is available before throwing it into this function.
The window function is convolved with the signal by taking the time- domain product, and doing the inverse FFT to get the convolved spectrum back.
The one exception is the gaussian window - if a user specifies “gaussian” for the window function, the actual window function applied here is a half gaussian, i.e. a 1D gaussian blur.
- Parameters
dataframe (pandas DataFrame) – Pandas dataframe containing the spectral information
int_col (str, optional) – Column name to reference the signal
window (str, optional) – Name of the window function as implemented in SciPy.
sigma –
- Returns
new_y – Numpy 1D array containing the convolved signal
- Return type
array_like
-
pyspectools.spectra.analysis.
find_series
(combo: Tuple[float, float], frequencies: numpy.ndarray, search=0.005)[source]¶ Function that will exhaustively search for candidate progressions based on a pair of frequencies.
The difference of the pair is used to estimate B, which is then used to calculate J. These values of J are then used to predict the next set of lines, which are searched for in the soup of frequencies. The closest matches are added to a list which is returned.
This is done so that even if frequencies are missing a series of lines can still be considered.
combo - pair of frequencies corresponding to initial guess frequencies - array of frequencies to be searched search - optional threshold for determining the search range
to look for candidates
array of candidate frequencies
-
pyspectools.spectra.analysis.
fit_line_profile
(spec_df: pandas.core.frame.DataFrame, center: float, width=None, intensity=None, freq_col='Frequency', int_col='Intensity', fit_func=<class 'lmfit.models.GaussianModel'>, sigma=2, logger=None)[source]¶ Somewhat high level function that wraps lmfit for fitting Gaussian lineshapes to a spectrum.
For a given guess center and optional intensity, the
-
pyspectools.spectra.analysis.
harmonic_finder
(frequencies: numpy.ndarray, search=0.001, low_B=400.0, high_B=9000.0)[source]¶ Function that will generate candidates for progressions. Every possible pair combination of frequencies are looped over, consider whether or not the B value is either too small (like C60 large) or too large (you won’t have enough lines to make a progression), and search the frequencies to find the nearest candidates based on a prediction.
- frequencies - array or tuple-like containing the progressions
we expect to find
- search - optional argument threshold for determining if something
is close enough
progressions - list of arrays corresponding to candidate progressions
-
pyspectools.spectra.analysis.
line_weighting
(frequency: float, catalog_frequency: float, intensity=None)[source]¶ Function for calculating the line weighting associated with each assignment candidate. The formula is based on intensity and frequency offset, such as to favor strong lines that are spot on over weak lines that are further away.
- Parameters
frequency (float) – Center frequency in MHz; typically the u-line frequency.
catalog_frequency (float) – Catalog frequency of the candidate
intensity (float, optional) – log Intensity of the transition; includes the line strength and the temperature factor.
- Returns
weighting – Associated weight value. Requires normalization
- Return type
float
-
pyspectools.spectra.analysis.
match_artifacts
(on_exp, off_exp, thres=0.05, freq_col='Frequency')[source]¶ Function to remove a set of artifacts found in a blank spectrum.
- Parameters
- AssignmentSession object (off_exp) – Experiment with the sample on; i.e. contains molecular features
- AssignmentSession object – Experiment with no sample; i.e. only artifacts
- float, optional (thres) – Threshold in absolute frequency units to match
- str, optional (freq_col) – Column specifying frequency in the pandas dataframes
- Returns
Dictionary with keys corresponding to the uline index, and values the frequency
- Return type
candidates - dict
-
pyspectools.spectra.analysis.
peak_find
(spec_df: pandas.core.frame.DataFrame, freq_col='Frequency', int_col='Intensity', thres=0.015, min_dist=10)[source]¶ Wrapper for peakutils applied to pandas dataframes. First finds the peak indices, which are then used to fit Gaussians to determine the center frequency for each peak.
- Parameters
spec_df (dataframe) – Pandas dataframe containing the spectrum information, with columns corresponding to frequency and intensity.
freq_col (str, optional) – Name of the frequency column in spec_df
int_col (str, optional) – Name of the intensity column in spec_df
thres (float, optional) – Threshold for peak detection
- Returns
Pandas dataframe containing the peaks frequency/intensity
- Return type
peak_df
-
pyspectools.spectra.analysis.
plotly_create_experiment_comparison
(experiments, thres_prox=0.2, index=0, filepath=None, **kwargs)[source]¶ Function to create a plot comparing multiple experiments. This is a high level function that wraps the correlate_experiments function, and provides a visual and interactive view of the spectra output from this function using Plotly.
This function is effectively equivalent to bokeh_create_experiment_comparison, however uses Plotly as the front end instead.
- Parameters
experiments –
thres_prox –
index –
filepath –
kwargs –
-
pyspectools.spectra.analysis.
search_center_frequency
(frequency: float, width=0.5)[source]¶ Function for wrapping the astroquery Splatalogue API for looking up a frequency and finding candidate molecules for assignment. The width parameter adjusts the +/- range to include in the search: for high frequency surveys, it’s probably preferable to use a percentage to accommodate for the typically larger uncertainties (sub-mm experiments).
- Parameters
frequency (float) – Frequency in MHz to search Splatalogue for.
width (float, optional) – Absolute frequency offset in MHz to include in the search.
- Returns
Pandas dataframe containing frequency matches, or None if no matches are found.
- Return type
dataframe or None
-
pyspectools.spectra.analysis.
search_molecule
(species: str, freq_range=[0.0, 40000.0], **kwargs)[source]¶ Function to search Splatalogue for a specific molecule. Technically I’d prefer to download entries from CDMS instead, but this is probably the most straight forward way.
The main use for this function is to verify line identifications - if a line is tentatively assigned to a U-line, then other transitions for the molecule that are stronger or comparatively strong should be visible.
- Parameters
species (str) – Chemical name of the molecule
freq_range (list) – The frequency range to perform the lookup
- Returns
Pandas dataframe containing transitions for the given molecule. If no matches are found, returns None.
- Return type
DataFrame or None
pyspectools.spectra.assignment module¶
assignment module
This module contains three main classes for performing analysis of broad- band spectra. The AssignmentSession class will be what the user will mainly interact with, which will digest a spectrum, find peaks, make assignments and keep track of them, and generate the reports at the end.
To perform the assignments, the user can use the LineList class, which does the grunt work of homogenizing the different sources of frequency and molecular information: it is able to take SPCAT and .lin formats, as well as simply a list of frequencies. LineList then interacts with the AssignmentSession class, which handles the assignments.
The smallest building block in this procedure is the Transition class; every peak, every molecule transition, every artifact is considered as a Transition object. The LineList contains a list of Transition`s, and the peaks found by the `AssignmentSession are also kept as a LineList.
-
class
pyspectools.spectra.assignment.
AssignmentSession
(exp_dataframe: pandas.core.frame.DataFrame, experiment: int, composition: List[str], temperature=4.0, velocity=0.0, freq_col='Frequency', int_col='Intensity', verbose=True, **kwargs)[source]¶ Bases:
object
Main class for bookkeeping and analyzing broadband spectra. This class revolves around operating on a single continuous spectrum, using the class functions to automatically assess the noise statistics, find peaks, and do the bulk of the bookkeeping on what molecules are assigned to what peak.
-
add_ulines
(data: List[Tuple[float, float]], **kwargs)[source]¶ Function to manually add multiple pairs of frequency/intensity to the current experiment’s Peaks list.
Kwargs are passed to the creation of the Transition object.
- Parameters
data (iterable of 2-tuple) –
List-like of 2-tuples corresponding to frequency and intensity. Data should look like this example: [
(12345.213, 5.), (18623.125, 12.3)
]
-
analyze_molecule
(Q=None, T=None, name=None, formula=None, smiles=None, chi_thres=10.0)[source]¶ Function for providing some astronomically relevant parameters by analyzing Gaussian line shapes.
- Parameters
- float (chi_thres) – Partition function at temperature T
- float – Temperature in Kelvin
- str, optional (smiles) – Name of the molecule to perform the analysis on. Can be used as a selector.
- str, optional – Chemical formula of the molecule to perform the analysis on. Can be used as a selector.
- str, optional – SMILES code of the molecule to perform the analysis on. Can be used as a selector,
- float – Threshold for the Chi Squared value to consider fits for statistics. Any instances of fits with Chi squared values above this value will not be used to calculate line profile statistics.
- Returns
First element is the profile dataframe, and second element is the fitted velocity. If a rotational temperature analysis is also performed, the third element will be the least-squares regression.
- Return type
return_data - list
-
apply_filter
(window: Union[str, List[str]], sigma=0.5, int_col=None)[source]¶ Applies a filter to the spectral signal. If multiple window functions are to be used, a list of windows can be provided, which will then perform the convolution stepwise. With the exception of the gaussian window function, the others functions use the SciPy signal functions.
A reference copy of the original signal is kept as the “Ref” column; this is used if a new window function is applied, rather than on the already convolved signal.
- Parameters
window (str, or iterable of str) – Name of the window function
sigma (float, optional) – Specifies the magnitude of the gaussian blur. Only used when the window function asked for is “gaussian”.
int_col (None or str, optional) – Specifies which column to apply the window function to. If None, defaults to the session-wide intensity column
-
blank_spectrum
(noise=0.0, noise_std=0.05, window=1.0)[source]¶ Blanks a spectrum based on the lines already previously assigned. The required arguments are the average and standard deviation of the noise, typically estimated by picking a region free of spectral features.
The spectra are sequentially blanked - online catalogs first, followed by literature species, finally the private assignments.
- Parameters
- float (window) – Average noise value for the spectrum. Typically measured by choosing a region void of spectral lines.
- float – Standard deviation for the spectrum noise.
- float – Value to use for the range to blank. This region blanked corresponds to frequency+/-window.
-
calculate_assignment_statistics
()[source]¶ Function for calculating some aggregate statistics of the assignments and u-lines. This breaks the assignments sources up to identify what the dominant source of information was. The two metrics for assignments are the number of transitions and the intensity contribution assigned by a particular source. :return: dict
-
clean_folder
(action=False)[source]¶ Method for cleaning up all of the directories used by this routine. Use with caution!!!
Requires passing a True statement to actually clean up.
- Parameters
action (bool) – If True, folders will be deleted. If False (default) nothing is done.
-
clean_spectral_assignments
(window=1.0)[source]¶ Function to blank regions of the spectrum that have already been assigned. This function takes the frequencies of assignments, and uses the noise statistics to generate white noise to replace the peak. This is to let one focus on unidentified features, rather than be distracted by the assignments with large amplitudes.
- Parameters
window (float, optional) – Frequency value in MHz to blank. The region corresponds to the frequency +/- this value.
-
copy_assignments
(other: pyspectools.spectra.assignment.AssignmentSession, thres_prox=0.01)[source]¶ Function to copy assignments from another experiment. This class method wraps two analysis routines: first, correlations in detected peaks are found, and indexes of where correlations are found will be used to locate the corresponding Transition object, and copy its data over to the current experiment.
- Parameters
other (AssignmentSession object) – The reference AssignmentSession object to copy assignments from
thres_prox (float, optional) – Threshold for considering coincidences between spectra.
-
create_full_dr_batch
(cavity_freqs: List[float], filepath=None, shots=25, dipole=1.0, min_dist=500.0, atten=None, drpower=13)[source]¶ Create an FTB batch file for use in QtFTM to perform a DR experiment. A list of selected frequencies can be used as the cavity frequencies, which will subsequently be exhaustively DR’d against by ALL frequencies in the experiment.
The file is then saved to “ftb/XXX-full-dr.ftb”.
The
atten
parameter provides a more direct way to control RF power; if this value is used, it will overwrite the dipole moment setting.- Parameters
cavity_freqs (iterable of floats) – Iterable of frequencies to tune to, in MHz.
filepath (str, optional) – Path to save the ftb file to. Defaults to ftb/{}-dr.ftb
shots (int, optional) – Number of integration shots
dipole (float, optional) – Dipole moment used for attenuation setting
min_dist (float, optional) – Minimum frequency difference between cavity and DR frequency to actually perform the experiment
atten (None or int, optional) – Value to set the rf attenuation. By default, this is None, which will use the dipole moment instead to set the rf power. If a value is provided, it will overwrite whatever the dipole moment setting is.
-
create_latex_table
(filepath=None, header=None, cols=None, **kwargs)[source]¶ Method to create a LaTeX table summarizing the measurements in this experiment.
Without any additional inputs, the table will be printed into a .tex file in the reports folder. The table will be created with the minimum amount of information required for a paper, including the frequency and intensity information, assignments, and the source of the information.
The user can override the default settings by supplying header and col arguments, and any other kwargs are passed into the to_latex pandas DataFrame method. The header and col lengths must match.
- Parameters
filepath (str, optional) – Filepath to save the LaTeX table to; by default None
header (iterable of str, optional) – An iterable of strings specifying the header to be printed. By default None
cols (iterable of str, optional) – An iterable of strings specifying which columns to include. If this is changed, the header must also be changed to reflect the new columns.
-
create_uline_dr_batch
(filepath=None, select=None, shots=25, dipole=1.0, min_dist=500.0, thres=None, atten=None, drpower=13)[source]¶ Create an FTB batch file for use in QtFTM to perform a DR experiment. A list of selected frequencies can be used as the cavity frequencies, which will subsequently be exhaustively DR’d against by all of the U-line frequencies remaining in this experiment.
The file is then saved to “ftb/XXX-dr.ftb”.
- Parameters
filepath (str, optional) – Path to save the ftb file to. Defaults to ftb/{}-dr.ftb
select (list of floats, optional) – List of frequencies to use as cavity frequencies. Defaults to None, which will just DR every frequency against each other.
shots (int, optional) – Number of integration shots
dipole (float, optional) – Dipole moment used for attenuation setting
gap (float, optional) – Minimum frequency difference between cavity and DR frequency to actually perform the experiment
thres (None or float, optional) – Minimum value in absolute intensity units to consider in the DR batch. If None, this is ignored (default).
atten (None or int, optional) – Value to use for the attenuation, overwriting the dipole argument. This is useful for forcing cavity power in the high band.
-
create_uline_ftb_batch
(filepath=None, shots=500, dipole=1.0, threshold=0.0, sort_int=False, atten=None)[source]¶ Create an FTB file for use in QtFTM based on the remaining ulines. This is used to provide cavity frequencies.
If a filepath is not specified, a -uline.ftb file will be created in the ftb folder.
The user has the ability to control parameters of the batch by setting a global shot count, dipole moment, and minimum intensity value for creation.
- Parameters
filepath (str or None, optional) – Path to save the .ftb file to. If None, defaults to the session ID.
shots (int, optional) – Number of shots to integrate on each frequency
dipole (float, optional) – Dipole moment in Debye attenuation target for each frequency
threshold (float, optional) – Minimum value for the line intensity to be considered. For example, if the spectrum is analyzed in units of SNR, this would be the minimum value of SNR to consider in the FTB file.
sort_int (bool, optional) – If True, sorts the FTB entries in descending intensity order.
atten (None or int, optional) – Value to use for the attenuation, overwriting the dipole argument. This is useful for forcing cavity power in the high band.
-
create_ulinelist
(filepath: str, silly=True)[source]¶ Create a LineList object for an unidentified molecule. This uses the class method umol_gen to automatically generate names for U-molecules which can then be renamed once it has been identified.
The session attribute umol_names also keeps track of filepaths to catalog names. If the filepath has been used previously, then it will raise an Exception noting that the filepath is already associated with another catalog.
- Parameters
filepath (str) – File path to the catalog or .lin file to use as a reference
silly (bool, optional) – Flag whether to use boring numbered identifiers, or randomly generated AdjectiveAdjectiveAnimal.
- Returns
- Return type
LineList object
-
detect_noise_floor
(region=None, als=True, **kwargs)[source]¶ Set the noise parameters for the current spectrum. Control over what “defines” the noise floor is specified with the parameter region. By default, if region is None then the function will perform an initial peak find using 1% of the maximum intensity as the threshold. The noise region will be established based on the largest gap between peaks, i.e. hopefully capturing as little features in the statistics as possible.
The alternative method is invoked when the als argument is set to True, which will use the asymmetric least-squares method to determine the baseline. Afterwards, the baseline is decimated by an extremely heavy Gaussian blur, and one ends up with a smoothly varying baseline. In this case, there is no noise_rms attribute to be returned as it is not required to determine the minimum peak threshold.
- Parameters
region (2-tuple or None, optional) – If None, use the automatic algorithm. Otherwise, a 2-tuple specifies the region of the spectrum in frequency to use for noise statistics.
als (bool, optional) – If True, will use the asymmetric least squares method to determine the baseline.
kwargs – Additional kwargs are passed into the ALS function.
- Returns
baseline - float – Value of the noise floor
rms - float – Noise RMS/standard deviation
-
df2ulines
(dataframe: pandas.core.frame.DataFrame, freq_col=None, int_col=None)[source]¶ Add a dataframe of frequency and intensities to the session U-line dictionary. This function provides more manual control over what can be processed in the assignment pipeline, as not everything can be picked up by the peak finding algorithm.
- Parameters
dataframe (pandas dataframe) – Dataframe containing a frequency and intensity column to add to the uline list.
freq_col (None or str) – Specify column to use for frequencies. If None, uses the session value freq_col.
int_col (None or str) – Specify column to use for intensities. If None, uses the session value int_col.
-
finalize_assignments
()[source]¶ Function that will complete the assignment process by serializing DataClass objects and formatting a report.
Creates summary pandas dataframes as self.table and self.profiles, which correspond to the assignments and fitted line profiles respectively.
-
find_peaks
(threshold=None, region=None, sigma=6, min_dist=10, als=True, **kwargs)[source]¶ Find peaks in the experiment spectrum, with a specified threshold value or automatic threshold. The method calls the peak_find function from the analysis module, which in itself wraps peakutils.
The function works by finding regions of the intensity where the first derivative goes to zero and changes sign. This gives peak frequency/intensities from the digitized spectrum, which is then “refined” by interpolating over each peak and fitting a Gaussian to determine the peak.
The peaks are then returned as a pandas DataFrame, which can also be accessed in the peaks_df attribute of AssignmentSession.
When a value of threshold is not provided, the function will turn to use automated methods for noise detection, either by taking a single value as the baseline (not ALS), or by using the asymmetric least-squares method for fitting the baseline. In both instances, the primary intensity column to be used for analysis will be changed to “SNR”, which is the recommended approach.
To use the ALS algorithm there may be some tweaking involved for the parameters. These are typically found empirically, but for reference here are some “optimal” values that have been tested.
For millimeter-wave spectra, larger values of lambda are favored:
lambda = 1e5 p = 0.1
This should get rid of periodic (fringe) baselines, and leave the “real” signal behind.
- Parameters
threshold (float or None, optional) – Peak detection threshold. If None, will take 1.5 times the noise RMS.
region (2-tuple or None, optional) – If None, use the automatic algorithm. Otherwise, a 2-tuple specifies the region of the spectrum in frequency to use for noise statistics.
sigma (float, optional) – Defines the number of sigma (noise RMS) above the baseline to use as the peak detection threshold.
min_dist (int, optional) – Number of channels between peaks to be detected.
als (bool, optional) – If True, uses ALS fitting to determine a baseline.
kwargs – Additional keyword arguments are passed to the ALS fitting routine.
- Returns
peaks_df – Pandas dataframe with Frequency/Intensity columns, corresponding to peaks
- Return type
dataframe
-
find_progressions
(search=0.001, low_B=400.0, high_B=9000.0, refit=False, plot=True, preferences=None, **kwargs)[source]¶ Performs a search for possible harmonically related U-lines. The first step loops over every possible U-line pair, and uses the difference to estimate an effective B value for predicting the next transition. If the search is successful, the U-line is added to the list. The search is repeated until there are no more U-lines within frequency range of the next predicted line.
Once the possible series are identified, the frequencies are fit to an effective linear molecule model (B and D terms). An affinity propa- gation cluster model is then used to group similar progressions toge- ther, with either a systematic test of preference values or a user specified value.
- Parameters
search (float, optional) – Percentage value of the target frequency cutoff for excluding possible candidates in the harmonic search
high_B (low_B,) – Minimum and maximum value of B in MHz to be considered. This constrains the size of the molecule you are looking for.
refit (bool, optional) – If True, B and D are refit to the cluster frequencies.
plot (bool, optional) – If True, creates a Plotly scatter plot of the clusters, as a funct- ion of the preference values.
preferences (float or array_like of floats, optional) – A single value or an array of preference values for the AP cluster model. If None, the clustering will be performed on a default grid, where all of the results are returned.
kwargs (optional) – Additional kwargs are passed to the AP model initialization.
- Returns
- Return type
fig
-
classmethod
from_ascii
(filepath: str, experiment: int, composition=['C', 'H'], delimiter='\t', temperature=4.0, velocity=0.0, col_names=None, freq_col='Frequency', int_col='Intensity', skiprows=0, verbose=False, **kwargs)[source]¶ Class method for AssignmentSession to generate a session using an ASCII file. This is the preferred method for starting an AssignmentSession. The ASCII files are parsed using the pandas method read_csv, with the arguments for reading simply passed to that function.
Example based on blackchirp spectrum: The first row in an ASCII output from blackchirp contains the headers, which typically should be renamed to “Frequency, Intensity”. This can be done with this call:
``` session = AssignmentSession.from_ascii(
filepath=”ft1020.txt”, experiment=0, col_names=[“Frequency”, “Intensity”], skiprows=1 )
Example based on astronomical spectra: File formats are not homogenized, and delimiters may change. This exam- ple reads in a comma-separated spectrum, with a radial velocity of +26.2 km/s.
``` session = AssignmentSession.from_ascii(
filepath=”spectrum.mid.dat”, experiment=0, col_names=[“Frequency”, “Intensity”], velocity=26.2, delimiter=”,” )
- Parameters
filepath (str) – Filepath to the ASCII spectrum
experiment (int) – Integer identifier for the experiment
composition (list of str, optional) – List of atomic symbols, representing the atomic composition of the experiment
delimiter (str, optional) – Delimiter character used in the ASCII file. For example, ” “, “s”, “,”
velocity (float, optional) – Radial velocity to offset the frequency in km/s.
temperature (float, optional) – Rotational temperature in Kelvin used for the experiment.
col_names (None or list of str, optional) – Names to rename the columns. If None, this is ignored.
freq_col (str, optional) – Name of the column to be used for the frequency axis
int_col (str, optional) – Name of the column to be used for the intensity axis
skip_rows (int, optional) – Number of rows to skip reading.
verbose (bool, optional) – If True, the logging module will also print statements and display any interaction that happens.
kwargs – Additional kwargs are passed onto initializing the Session class
- Returns
- Return type
-
classmethod
load_session
(filepath: str)[source]¶ Load an AssignmentSession from disk, once it has been saved with the save_session method which creates a pickle file.
- Parameters
filepath (str) – path to the AssignmentSession pickle file; typically in the sessions/{experiment_id}.pkl
- Returns
Instance of the AssignmentSession loaded from disk
- Return type
-
match_artifacts
(artifact_exp: pyspectools.spectra.assignment.AssignmentSession, threshold=0.05)[source]¶ TODO: Need to update this method; process_artifacts is no longer a method.
Remove artifacts based on another experiment which has the blank sample - i.e. only artifacts.
The routine will simple match peaks found in the artifact experiment, and assign all coincidences in the present experiment as artifacts.
- Parameters
- AssignmentSession object (artifact_exp) – Experiment with no sample present
- float, optional (threshold) – Threshold in absolute frequency units for matching
-
overlay_molecule
(species: str, freq_range=None, threshold=- 7.0)[source]¶ Function to query splatalogue for a specific molecule. By default, the frequency range that will be requested corresponds to the spectral range available in the experiment.
- Parameters
species (str) – Identifier for a specific molecule, typically name
- Returns
FigureWidget – Plotly FigureWidget that shows the experimental spectrum along with the detected peaks, and the molecule spectrum.
DataFrame – Pandas DataFrame from the Splatalogue query.
- Raises
Exception – If no species are found in the query, raises Exception.
-
plot_assigned
()[source]¶ Generates a Plotly figure with the assignments overlaid on the experimental spectrum.
Does not require any parameters, but requires that the assignments and peak finding functions have been run previously.
-
plot_breakdown
()[source]¶ Generate two charts to summarize the breakdown of spectral features. The left column plot shows the number of ulines being assigned by the various sources of frequency data.
Artifacts - instrumental interference, from the function process_artifacts Splatalogue - uses the astroquery API, from the function splat_assign_spectrum Published - local catalogs, but with the public kwarg flagged as True Unpublished - local catalogs, but with the public kwarg flagged as False :return: Plotly FigureWidget object
-
plot_spectrum
(simulate=False)[source]¶ Generates a Plotly figure of the spectrum. If U-lines are present, it will plot the simulated spectrum also.
-
process_clock_spurs
(**kwargs)[source]¶ Method that will generate a LineList corresponding to possible harmonics, sum, and difference frequencies based on a given clock frequency (default: 65,000 MHz).
It is advised to run this function at the end of assignments, owing to the sheer number of possible combinations of lines, which may interfere with real molecular features.
- Parameters
kwargs – Optional kwargs are passed into the creation of the LineList with LineList.from_clock.
-
process_db
(auto=True, dbpath=None)[source]¶ Function for assigning peaks based on a local database file. The database is controlled with the SpectralCatalog class, which will handle all of the searching.
- Parameters
auto (bool, optional) – If True, the assignments are made automatically.
dbpath (str or None, optional) – Filepath to the local database. If none is supplied, uses the default value from the user’s home directory.
-
process_linelist
(name=None, formula=None, filepath=None, linelist=None, auto=True, thres=- 10.0, progressbar=True, tol=None, **kwargs)[source]¶ General purpose function for performing line assignments using local catalog and line data. The two main ways of running this function is to either provide a linelist or filepath argument. The type of linelist will be checked to determine how the catalog data will be processed: if it’s a string, it will be used to use
- Parameters
name (str, optional) – Name of the molecule being assigned. This should be specified when providing a new line list, which then gets added to the experiment.
formula (str, optional) – Chemical formula for the molecule being assigned. Should be added in conjuction with name.
filepath (str, optional) – If a linelist is not given, a filepath can be specified corresponding to a .cat or .lin file, which will be used to create a LineList object.
linelist (str or LineList, optional) – Can be the name of a molecule or LineList object; the former is specified as a string which looks up the experiment line_list attribute for an existing LineList object. If a LineList object is provided, the function will use this directly.
auto (bool, optional) – Specifies whether the assignment procedure works without intervention. If False, the user will be prompted to provide a candidate index.
thres (float, optional) – log Intensity cut off used to screen candidates.
progressbar (bool, optional) – If True, a tqdm progressbar will indicate assignment progress.
tol (float, optional) – Tolerance for making assignments. If None, the function will default to the session-wide values of freq_abs and freq_prox to determine the tolerance.
kwargs – Kwargs are passed to the Transition object update when assignments are made.
-
process_linelist_batch
(param_dict=None, yml_path=None, **kwargs)[source]¶ Function for processing a whole folder of catalog files. This takes a user-specified mapping scheme that will associate catalog files with molecule names, formulas, and any other LineList/Transition attributes. This can be in the form of a dictionary or a YAML file; one has to be provided.
An example scheme is given here: {
- “cyclopentadiene”: {
“formula”: “c5h6”, “filepath”: “../data/catalogs/cyclopentadiene.cat”
}
} The top dictionary has keys corresponding to the name of the molecule, and the value as a sub dictionary containing the formula and filepath to the catalog file as minimum input.
You can also provide additional details that are Transition attributes: {
- “benzene”: {
“formula”: “c6h6”, “filepath”: “../data/catalogs/benzene.cat”, “smiles”: “c1ccccc1”, “publc”: False
}
}
- Parameters
param_dict (dict or None, optional) – If not None, a dictionary containing the mapping scheme will be used to process the catalogs. Defaults to None.
yml_path (str or None, optional) – If not None, corresponds to a str filepath to the YAML file to be read.
kwargs – Additional keyword arguments will be passed into the assignment process, which are the args for process_linelist.
:raises ValueError : If yml_path and param_dict args are the same value.:
-
process_splatalogue
(auto=True, progressbar=True)[source]¶ - Function that will provide an “interface” for interactive
line assignment in a notebook environment.
Basic functionality is looping over a series of peaks, which will query splatalogue for known transitions in the vicinity. If the line is known in Splatalogue, it will throw it into an Transition object and flag it as known. Conversely, if it’s not known in Splatalogue it will defer assignment, flagging it as unassigned and dumping it into the uline attribute.
- Parameters
auto (bool) – If True the assignment process does not require user input, otherwise will prompt user.
-
rename_umolecule
(name: str, new_name: str, formula='')[source]¶ Function to update the name of a LineList. This function should be used to update a LineList, particularly when the identity of an unidentified molecule is discovered.
- Parameters
name (str) – Old name of the LineList.
new_name (str) – New name of the LineList - preferably, a real molecule name.
formula (str, optional) – New formula of the LineList.
-
save_session
(filepath=None)[source]¶ Method to save an AssignmentSession to disk.
The underlying mechanics are based on the joblib library, and so there can be cross-compatibility issues particularly when loading from different versions of Python.
- Parameters
- str (filepath) – Path to save the file to. By default it will go into the sessions folder.
-
search_frequency
(frequency: float)[source]¶ Function for searching the experiment for a particular frequency. The search range is defined by the Session attribute freq_prox, and will first look for the frequency in the assigned features if any have been made. The routine will then look for it in the U-lines.
- Parameters
frequency (float) – Center frequency in MHz
- Returns
Pandas dataframe with the matches
- Return type
dataframe
-
search_species
(formula=None, name=None, smiles=None)[source]¶ Method for finding species in the assigned dataframe, with the intention of showing where the observed frequencies are.
- Parameters
- str for chemical formula lookup (formula) –
- str for common name (name) –
- str for unique SMILES string (smiles) –
- Returns
- Return type
pandas dataframe slice with corresponding lookup
-
set_velocity
(value: float)[source]¶ Set the radial velocity offset for the spectrum. The velocity is specified in km/s, and is set up such that the notation is positive velocity yields a redshifted spectrum (i.e. moving towards us).
This method should be used to change the velocity, as it will automatically re-calculate the dataframe frequency column to the new velocity.
- Parameters
value (float) – Velocity in km/s
-
simulate_spectrum
(x: numpy.ndarray, centers: List[float], widths: List[float], amplitudes: List[float], fake=False)[source]¶ Generate a synthetic spectrum with Gaussians with the specified parameters, on a given x axis.
GaussianModel is used here to remain internally consistent with the rest of the code.
x: array of x values to evaluate Gaussians on centers: array of Gaussian centers widths: array of Gaussian widths amplitudes: array of Gaussian amplitudes fake: bool indicating whether false intensities are used for the simulation
- Return y
array of y values
-
simulate_sticks
(catalogpath: str, N: float, Q: float, T: float, doppler=None, gaussian=False)[source]¶ Simulates a stick spectrum with intensities in flux units (Jy) for a given catalog file, the column density, and the rotational partition function at temperature T.
- Parameters
catalogpath (str) – path to SPCAT catalog file
N (float) – column density in cm^-2
Q (float) – partition function at temperature T
T (float) – temperature in Kelvin
doppler (float, optional) – doppler width in km/s; defaults to session wide value
gaussian (bool, optional) – if True, simulates Gaussian profiles instead of sticks
Returns –
------- –
- Returns
if gaussian is False, returns a dataframe with sticks; if True, returns a simulated Gaussian line profile spectrum
-
splat_assign_spectrum
(auto=False)[source]¶ Alias for process_splatalogue. Function will be removed in a later version.
- Parameters
auto (bool) – Specifies whether the assignment procedure is automatic.
-
stacked_plot
(frequencies: List[float], int_col=None, freq_range=0.05)[source]¶ Special implementation of the stacked_plot from the figurefactory module, adapted for AssignmentSession. In this version, the assigned/u-lines are also indicated.
This function will generate a Plotly figure that stacks up the spectra as subplots, with increasing frequencies going up the plot. This function was written primarily to identify harmonically related lines, which in the absence of centrifugal distortion should line up perfectly in the center of the plot.
Due to limitations with Plotly, there is a maximum of ~8 plots that can stacked and will return an Exception if > 8 frequencies are provided.
frequencies: list of floats, corresponding to center frequencies freq_range: float percentage value of each center frequency to use as cutoffs
- Returns
Plotly Figure object
-
umol_gen
(silly=True)[source]¶ Generator for unidentified molecule names. Wraps :Yields: str – Formatted as “UMol_XXX”
-
update_database
(dbpath=None)[source]¶ Adds all of the entries to a specified SpectralCatalog database. The database defaults to the global database stored in the home directory. This method will remove everything in the database associated with this experiment’s ID, and re-add the entries.
- Parameters
dbpath (str, optional) – path to a SpectralCatalog database. Defaults to the system-wide catalog.
-
-
class
pyspectools.spectra.assignment.
LineList
(name: str = '', formula: str = '', smi: str = '', filecontents: str = '', filepath: str = '', transitions: List = <factory>, frequencies: List[float] = <factory>, catalog_frequencies: List[float] = <factory>, source: str = '')[source]¶ Bases:
object
Class for handling and homogenizing all of the possible line lists: from peaks to assignments to catalog files.
-
name
¶ Name of the line list. Can be used to identify the molecule, or to simply state the purpose of the list.
- Type
str, optional
-
formula
¶ Chemical formula for the molecule, if applicable.
- Type
str, optional
-
smi
¶ SMILES representation of the molecule, if applicable.
- Type
str, optional
-
filecontents
¶ String representation of the file contents used to make the line list.
- Type
str, optional
-
filepath
¶ Path to the file used to make the list.
- Type
str, optional
-
transitions
¶ A designated list for holding Transition objects. This is the bulk of the information for a given line list.
- Type
list, optional
-
add_uline
(frequency: float, intensity: float, **kwargs)[source]¶ Function to manually add a U-line to the LineList. The function creates a Transition object with the frequency and intensity values provided by a user, which is then compared with the other transition entries within the LineList. If it doesn’t already exist, it will then add the new Transition to the LineList.
Kwargs are passed to the creation of the Transition object.
- Parameters
intensity (frequency,) – Floats corresponding to the frequency and intensity of the line in a given unit.
-
add_ulines
(data: List[Tuple[float, float]], **kwargs)[source]¶ Function to add multiple pairs of frequency/intensity to the current LineList.
Kwargs are passed to the creation of the Transition object.
- Parameters
data (iterable of 2-tuple) –
List-like of 2-tuples corresponding to frequency and intensity. Data should look like this example: [
(12345.213, 5.), (18623.125, 12.3)
]
-
catalog_frequencies
: List[float]¶
-
filecontents
: str = ''¶
-
filepath
: str = ''¶
-
find_candidates
(frequency: float, lstate_threshold=4.0, freq_tol=0.1, int_tol=- 10.0, max_uncertainty=0.2)[source]¶ Function for searching the LineList for candidates. The first step uses pure Python to isolate transitions that meet three criteria: the lower state energy, the catalog intensity, and the frequency distance.
If no candidates are found, the function will return None. Otherwise, it will return the list of transitions and a list of associated normalized weights.
- Parameters
frequency (float) – Frequency in MHz to try and match.
lstate_threshold (float, optional) – Lower state energy threshold in Kelvin
freq_tol (float, optional) – Frequency tolerance in MHz for matching two frequencies
int_tol (float, optional) – log Intensity threshold
- Returns
If candidates are found, lists of the transitions and the associated weights are returned. Otherwise, returns None
- Return type
transitions, weighting or None
-
find_nearest
(frequency: float, tol=0.001)[source]¶ Look up transitions to find the nearest in frequency to the query. If the matched frequency is within a tolerance, then the function will return the corresponding Transition. Otherwise, it returns None.
- Parameters
frequency (float) – Frequency in MHz to search for.
tol (float, optional) – Maximum tolerance for the deviation from the LineList frequency and query frequency
- Returns
- Return type
Transition object or None
-
formula
: str = ''¶
-
frequencies
: List[float]¶
-
classmethod
from_artifacts
(frequencies: List[float], **kwargs)[source]¶ Specialized class method for creating a LineList object specifically for artifacts/RFI. These Transitions are specially flagged as Artifacts.
- Parameters
frequencies (iterable of floats) – List or array of floats corresponding to known artifact frequencies.
kwargs – Kwargs are passed into the Transition object creation.
- Returns
- Return type
-
classmethod
from_catalog
(name: str, formula: str, filepath: str, min_freq=0.0, max_freq=1000000000000.0, max_lstate=9000.0, **kwargs)[source]¶ Create a Line List object from an SPCAT catalog. :param name: Name of the molecule the catalog belongs to :type name: str :param formula: Chemical formula of the molecule :type formula: str :param filepath: Path to the catalog file. :type filepath: str :param min_freq: Minimum frequency in MHz for the frequency cutoff :type min_freq: float, optional :param max_freq: Maximum frequency in MHz for the frequency cutoff :type max_freq: float, optional :param max_lstate: Maximum lower state energy to filter out absurd lines :type max_lstate: float, optional :param kwargs: Additional attributes that are passed into the Transition objects. :type kwargs: optional
- Returns
Instance of LineList with the digested catalog.
- Return type
linelist_obj
-
classmethod
from_clock
(max_multi=64, clock=65000.0, **kwargs)[source]¶ Method of generating a LineList object by calculating all possible combinations of the
- Parameters
max_multi (int, optional) – [description], by default 64
clock (float, optional) – Clock frequency to calculate sub-harmonics of, in units of MHz. Defaults to 65,000 MHz, which corresponds to the Keysight AWG
- Returns
LineList object with the full list of possible clock spurs, as harmonics, sum, and difference frequencies.
- Return type
LineList object
-
classmethod
from_dataframe
(dataframe: pandas.core.frame.DataFrame, name='Peaks', freq_col='Frequency', int_col='Intensity', **kwargs)[source]¶ Specialized class method for creating a LineList object from a Pandas Dataframe. This method is called by the AssignmentSession.df2ulines function to generate a Peaks LineList during peak detection.
- Parameters
dataframe (pandas DataFrame) – DataFrame containing frequency and intensity information
freq_col (str, optional) – Name of the frequency column
int_col (str, optional) – Name of the intensity column
kwargs – Optional settings are passed into the creation of Transition objects.
- Returns
- Return type
-
classmethod
from_lin
(name: str, filepath: str, formula='', **kwargs)[source]¶ Generate a LineList object from a .lin file. This method should be used for intermediate assignments, when one does not know what the identity of a molecule is but has measured some frequency data.
- Parameters
name (str) – Name of the molecule
filepath (str) – File path to the .lin file.
formula (str, optional) – Chemical formula of the molecule if known.
kwargs – Additional kwargs are passed into the Transition objects.
- Returns
- Return type
-
classmethod
from_list
(name: str, frequencies: List[float], formula='', **kwargs)[source]¶ Generic, low level method for creating a LineList object from a list of frequencies. This method can be used when neither lin, catalog, nor splatalogue is appropriate and you would like to manually create it by handpicked frequencies.
obj.uline == True,
Name of the species - doesn’t have to be its real name, just an identifier.
- frequencies: list
A list of floats corresponding to the “catalog” frequencies.
- formula: str, optional
Formula of the species, if known.
- kwargs
Optional settings are passed into the creation of Transition objects.
- Returns
- Return type
-
classmethod
from_pgopher
(name: str, filepath: str, formula='', **kwargs)[source]¶ Method to take the output of a PGopher file and create a LineList object. The PGopher output must be in the comma delimited specification.
This is actually the ideal way to generate LineList objects: it fills in all of the relevant fields, such as linestrength and state energies.
- Parameters
name (str) – Name of the molecule
filepath (str) – Path to the PGopher CSV output
formula (str, optional) – Chemical formula of the molecule, defaults to an empty string.
- Returns
- Return type
-
classmethod
from_splatalogue_query
(dataframe: pandas.core.frame.DataFrame, **kwargs)[source]¶ Method for converting a Splatalogue query dataframe into a LineList object. This is designed with the intention of pre-querying a set of molecules ahead of time, so that the user can have direct control over which molecules are specifically targeted without having to generate specific catalog files.
- Parameters
dataframe (pandas DataFrame) – DataFrame generated by the function analysis.search_molecule
- Returns
- Return type
-
get_assignments
()[source]¶ Function for retrieving assigned lines in a Line List.
- Returns
assign_objs – List of all of the transition objects where the uline flag is set to False.
- Return type
list
-
get_frequencies
(numpy=False)[source]¶ Method to extract all the frequencies out of a LineList
- Parameters
numpy (bool, optional) – If True, returns a NumPy ndarray with the frequencies.
- Returns
List of transition frequencies
- Return type
List or np.ndarray
-
get_multiple
()[source]¶ Convenience function to extract all the transitions within a LineList that have multiple possible assignments.
- Returns
List of Transition objects that have multiple assignments remaining.
- Return type
List
-
get_ulines
()[source]¶ Function for retrieving unidentified lines in a Line List.
- Returns
uline_objs – List of all of the transition objects where the uline flag is set to True.
- Return type
list
-
name
: str = ''¶
-
smi
: str = ''¶
-
source
: str = ''¶
-
to_dataframe
()[source]¶ Convert the transition data into a Pandas DataFrame. :returns: Pandas Dataframe with all of the transitions in the line list. :rtype: dataframe
-
to_ftb
(filepath=None, thres=- 10.0, shots=500, dipole=1.0, **kwargs)[source]¶ Function to create an FTB file from a LineList object. This will create entries for every transition entry above a certain intensity threshold, in whatever units the intensities are in; i.e. SPCAT will be in log units, while experimental peaks will be in whatever arbitrary voltage scale.
- Parameters
filepath (None or str, optional) – Path to write the ftb file to. If None (default), uses the name of the LineList and writes to the ftb folder.
thres (float, optional) – Threshold to cutoff transitions in the ftb file. Transitions with less intensity than this value not be included. Units are in the same units as whatever the LineList units are.
shots (int, optional) – Number of shots to integrate.
dipole (float, optional) – Target dipole moment for the species
kwargs – Additional kwargs are passed into the ftb creation, e.g. magnet, discharge, etc.
-
to_pickle
(filepath=None)[source]¶ Function to serialize the LineList to a Pickle file. If no filepath is provided, the function will default to using the name attribute of the LineList to name the file.
- Parameters
filepath (str or None, optional) – If None, uses name attribute for the filename, and saves to the linelists folder.
-
transitions
: List¶
-
update_linelist
(transition_objs: List[pyspectools.spectra.assignment.Transition])[source]¶ Adds transitions to a LineList if they do not exist in the list already.
- Parameters
transition_objs (list) – List of Transition objects
-
-
class
pyspectools.spectra.assignment.
Molecule
(name: str = '', formula: str = '', smi: str = '', filecontents: str = '', filepath: str = '', transitions: List = <factory>, frequencies: List[float] = <factory>, catalog_frequencies: List[float] = <factory>, source: str = '', A: float = 20000.0, B: float = 6000.0, C: float = 3500.0, var_file: str = '')[source]¶ Bases:
pyspectools.spectra.assignment.LineList
Special instance of the LineList class. The idea is to eventually use the high speed fitting/cataloguing routines by Brandon to provide quick simulations overlaid on chirp spectra.
Attributes
-
A
: float = 20000.0¶
-
B
: float = 6000.0¶
-
C
: float = 3500.0¶
-
var_file
: str = ''¶
-
-
class
pyspectools.spectra.assignment.
Session
(experiment: int, composition: List[str] = <factory>, temperature: float = 4.0, doppler: float = 0.01, velocity: float = 0.0, freq_prox: float = 0.1, freq_abs: bool = True, baseline: float = 0.0, noise_rms: float = 0.0, noise_region: List[float] = <factory>, max_uncertainty: float = 0.2)[source]¶ Bases:
object
Data class for handling parameters used for an AssignmentSession. The user generally shouldn’t need to directly interact with this class, but can give some level of dynamic control and bookkeeping to how and what molecules can be assigned, particularly with the composition, the frequency thresholds for matching, and the noise statistics.
-
experiment
¶ ID for experiment
- Type
int
-
composition
¶ List of atomic symbols. Used for filtering out species in the Splatalogue assignment procedure.
- Type
list of str
-
temperature
¶ Temperature in K. Used for filtering transitions in the automated assigment, which are 3 times this value.
- Type
float
-
doppler
¶ Doppler width in km/s; default value is about 5 kHz at 15 GHz. Used for simulating lineshapes and for lineshape analysis.
- Type
float
-
velocity
¶ Radial velocity of the source in km/s; used to offset the frequency spectrum
- Type
float
-
freq_prox
¶ frequency cutoff for line assignments. If freq_abs attribute is True, this value is taken as the absolute value. Otherwise, it is a percentage of the frequency being compared.
- Type
float
-
freq_abs
¶ If True, freq_prox attribute is taken as the absolute value of frequency, otherwise as a decimal percentage of the frequency being compared.
- Type
bool
-
baseline
¶ Baseline level of signal used for intensity calculations and peak detection
- Type
float
-
noise_rms
¶ RMS of the noise used for intensity calculations and peak detection
- Type
float
-
noise_region
¶ The frequency region used to define the noise floor.
- Type
2-tuple of floats
-
max_uncertainty
¶ Value to use as the maximum uncertainty for considering a transition for assignments.
- Type
float
-
baseline
: float = 0.0¶
-
composition
: List[str]¶
-
doppler
: float = 0.01¶
-
experiment
: int¶
-
freq_abs
: bool = True¶
-
freq_prox
: float = 0.1¶
-
max_uncertainty
: float = 0.2¶
-
noise_region
: List[float]¶
-
noise_rms
: float = 0.0¶
-
temperature
: float = 4.0¶
-
velocity
: float = 0.0¶
-
-
class
pyspectools.spectra.assignment.
Transition
(name: str = '', smiles: str = '', formula: str = '', frequency: float = 0.0, catalog_frequency: float = 0.0, catalog_intensity: float = 0.0, deviation: float = 0.0, intensity: float = 0.0, uncertainty: float = 0.0, S: float = 0.0, peak_id: int = 0, experiment: int = 0, uline: bool = True, composition: List[str] = <factory>, v_qnos: List[int] = <factory>, r_qnos: str = '', fit: Dict = <factory>, ustate_energy: float = 0.0, lstate_energy: float = 0.0, interference: bool = False, weighting: float = 0.0, source: str = 'Catalog', public: bool = True, velocity: float = 0.0, discharge: bool = False, magnet: bool = False, multiple: List[str] = <factory>, final: bool = False)[source]¶ Bases:
object
DataClass for handling assignments. Attributes are assigned in order to be sufficiently informative for a line assignment to be unambiguous and reproduce it later in a form that is both machine and human readable.
-
name
¶ IUPAC/common name; the former is preferred to be unambiguous
- Type
str
-
formula
¶ Chemical formula, or usually the stochiometry
- Type
str
-
smiles
¶ SMILES code that provides a machine and human readable chemical specification
- Type
str
-
frequency
¶ Observed frequency in MHz
- Type
float
-
intensity
¶ Observed intensity, in whatever units the experiments are in. Examples are Jy/beam, or micro volts.
- Type
float
-
catalog_frequency
¶ Catalog frequency in MHz
- Type
float
-
catalog_intensity
¶ Catalog line intensity, typically in SPCAT units
- Type
float
-
S
¶ Theoretical line strength; differs from the catalog line strength as it may be used for intrinsic line strength S u^2
- Type
float
-
peak_id
¶ Peak id from specific experiment
- Type
int
-
uline
¶ Flag to indicate whether line is identified or not
- Type
bool
-
composition
¶ A list of atomic symbols specifying what the experimental elemental composition is. Influences which molecules are considered possible in the Splatalogue assignment procedure.
- Type
list of str
-
v_qnos
¶ Quantum numbers for vibrational modes. Index corresponds to mode, and int value to number of quanta. Length should be equal to 3N-6.
- Type
list of int
-
r_qnos
¶ Rotational quantum numbers. TODO - better way of managing rotational quantum numbers
- Type
str
-
experiment
¶ Experiment ID to use as a prefix/suffix for record keeping
- Type
int
-
weighting
¶ Value for weighting factor used in the automated assignment
- Type
float
-
fit
¶ Contains the fitted parameters and model
- Type
dict
-
ustate_energy
¶ Energy of the upper state in Kelvin
- Type
float
-
lstate_energy
¶ Energy of the lower state in Kelvin
- Type
float
-
intereference
¶ Flag to indicate if this assignment is not molecular in nature
- Type
bool
-
source
¶ Indicates what the source used for this assignment is
- Type
str
-
public
¶ Flag to indicate if the information for this assignment is public/published
- Type
bool
-
velocity
¶ Velocity of the source used to make the assignment in km/s
- Type
float
-
discharge
¶ Whether or not the line is discharge dependent
- Type
bool
-
magnet
¶ Whether or not the line is magnet dependent (i.e. open shell)
- Type
bool
-
S
: float = 0.0¶
-
calc_intensity
(Q: float, T=300.0)[source]¶ Convert linestrength into intensity.
- Parameters
- float (T) – Partition function for the molecule at temperature T
- float – Temperature to calculate the intensity at in Kelvin
- Returns
log10 of the intensity in SPCAT format
- Return type
I - float
-
calc_linestrength
(Q: float, T=300.0)[source]¶ Convert intensity into linestrength.
- Parameters
- float (T) – Partition function for the molecule at temperature T
- float – Temperature to calculate the intensity at in Kelvin
- Returns
intrinsic linestrength of the transition
- Return type
intensity - float
-
catalog_frequency
: float = 0.0¶
-
catalog_intensity
: float = 0.0¶
-
check_molecule
(other)[source]¶ Check equivalency based on a common carrier. Compares the name, formula, and smiles of this Transition object with another.
- Returns
True if the two Transitions belong to the same carrier.
- Return type
bool
-
choose_assignment
(index: int)[source]¶ Function to manually pick an assignment from a list of multiple possible assignments found during process_linelist. After the new assignment is copied over, the final attribute is set to True and will no longer throw a warning duiring finalize_assignments.
- Parameters
index (int) – Index of the candidate to use for the assignment.
-
composition
: List[str]¶
-
deviation
: float = 0.0¶
-
discharge
: bool = False¶
-
experiment
: int = 0¶
-
final
: bool = False¶
-
fit
: Dict¶
-
formula
: str = ''¶
-
frequency
: float = 0.0¶
-
classmethod
from_dict
(data_dict: Dict)[source]¶ Method for generating an Assignment object from a dictionary. All this method does is unpack a dictionary into the __init__ method.
- Parameters
data_dict (dict) – Dictionary containing all of the Assignment DataClass fields that are to be populated.
- Returns
Converted Assignment object from the input dictionary
- Return type
-
classmethod
from_json
(json_path: str)[source]¶ Method for initializing an Assignment object from a JSON file.
- Parameters
json_path (str) – Path to JSON file
- Returns
Assignment object loaded from a JSON file.
- Return type
-
classmethod
from_yml
(yaml_path: str)[source]¶ Method for initializing an Assignment object from a YAML file.
- Parameters
yaml_path (str) – path to yaml file
- Returns
Assignment object loaded from a YAML file.
- Return type
-
get_spectrum
(x: numpy.ndarray)[source]¶ Generate a synthetic peak by supplying the x axis for a particular spectrum. This method assumes that some fit parameters have been determined previously.
- Parameters
x (Numpy 1D array) – Frequency bins from an experiment to simulate the line features.
- Returns
Values of the model function spectrum at each particular value of x
- Return type
Numpy 1D array
-
intensity
: float = 0.0¶
-
interference
: bool = False¶
-
lstate_energy
: float = 0.0¶
-
magnet
: bool = False¶
-
multiple
: List[str]¶
-
name
: str = ''¶
-
peak_id
: int = 0¶
-
public
: bool = True¶
-
r_qnos
: str = ''¶
-
reset_assignment
()[source]¶ Function to reset an assigned line into its original state. The only information that is kept regards to the frequency, intensity, and aspects about the experiment.
-
smiles
: str = ''¶
-
source
: str = 'Catalog'¶
-
to_file
(filepath: str, format='yaml')[source]¶ Save an Transition object to disk with a specified file format. Defaults to YAML.
- Parameters
filepath (str) – Path to yaml file
format (str, optional) – Denoting the syntax used for dumping. Defaults to YAML.
-
uline
: bool = True¶
-
uncertainty
: float = 0.0¶
-
ustate_energy
: float = 0.0¶
-
v_qnos
: List[int]¶
-
velocity
: float = 0.0¶
-
weighting
: float = 0.0¶
-