Analyzing Broadband Spectra with the `assignment` Module¶

Introduction¶

In this notebook, we’re going to work through how the core functionality of PySpecTools can be used to streamline and automate your spectral analysis. It’s worth noting that PySpecTools and Python provide enough flexibility for you to adjust to your needs; whatever can’t be done with PySpecTools natively could be automated with Python (e.g. for loops) and to a large extent pandas as well. In the latter case, particularly when you’re analyzing the assignments, and looking to filter out certain molecules, etc. This may be left for a subsequent notebook as the focus of this notebook is to demonstrate how automated assignment is performed.

The core functionality of assigning spectra revolves around the pyspectools.spectra.assignment module, and contains three main abstractions:

AssignmentSession
- This is your main interface: holds the spectral data, and allows you to interact (plot, assign, etc) with the data.
Transition
- Represents every type of spectral feature: every peak in an experiment, and every catalog entry.
LineList
- A collection of spectral features: the peaks in an experiment (which in themselves are Transition objects), and catalogs.

We will demonstrate how these pieces come together by looking at some of our published data: this notebook was used to analyze the Benzene discharge experiments reported in these two papers:

McCarthy, M. C.; Lee, K. L. K.; Carroll, P. B.; Porterfield, J. P.; Changala, P. B.; Thorpe, J. H.; Stanton, J. F. Exhaustive Product Analysis of Three Benzene Discharges by Microwave Spectroscopy. J. Phys. Chem. A 2020, 124 (25), 5170–5181. https://doi.org/10.1021/acs.jpca.0c02919.

Lee, K. L. K.; McCarthy, M. Study of Benzene Fragmentation, Isomerization, and Growth Using Microwave Spectroscopy. J. Phys. Chem. Lett. 2019, 10 (10), 2408–2413. https://doi.org/10.1021/acs.jpclett.9b00586.

The full dataset can also be found on our Zenodo repository; notebook “4000” most closely resembles this (this is a much more heavily marked up version).

We should stress that, while this is mostly automated, it does not change the fact that spectral analysis is very much an iterative process. You will make modifications to the way you do your analysis, and many things you won’t know until you’ve run it at least once. The point of having this notebook is so that it is reproducible and transparent: you can always modify the code and re-run the whole notebook with the latest analysis.

To begin the analysis, we will construct an AssignmentSession object using the class method, AssignmentSession.from_ascii(...). This method will take your ASCII spectrum containing frequency and intensity information, and parse it using pandas and store it as a DataFrame. With all Python routines, you can call the function/method with a question mark at the end to pull up the documentation associated with that function/method:

[2]:

from pyspectools.spectra.assignment import AssignmentSession, LineList

In this case, we’re setting up the session based on the Benzene data, which is a tab-delimited text file with a header. We ignore the header with skiprows=, and provide our own column names with the col_names argument. Additionally, we’re going to specify the composition we expect for the experiment with the composition kwarg: ideally we would only include ["C", "H"], however we know there are atmospheric impurities like nitrogen and oxygen that get incorporated in the discharge products. This keyword will affect Splatalogue assignments, and exclude catalogs that contain irrelevant compositions like metal-bearing molecules.

[2]:

session = AssignmentSession.from_ascii(
    "chirp_data/ft2632_hanning_620.txt",
    experiment=4000,
    col_names=["Frequency", "Intensity"],
    skiprows=1,
    composition=["C", "H", "N", "O"],
    verbose=False
)

You can also adjust many of these settings after the fact, which are stored as attributes of the Session object within an AssignmentSession. For example, the temperature attribute will set an upper limit to the lower state energies of states assignable: we will ignore all features that are double this specified energy. This isn’t the direct threshold, because it nominally corresponds to what your experimental temperature is, and depending on how prominent molecule is, you may see higher temperature transitions. Another useful thing to set is the maximum tolerance for uncertainty in catalog entries: we would like to reject assignments based on poorly predicted lines, which is set by the max_uncertainty attribute.

[ ]:

# temperature in K
session.session.temperature = 10.

# uncertainty in MHz
session.session.max_uncertainty = 0.2

Note that frequency units are in MHz, and temperature in kelvin.

The next step is to pre-process the spectrum. Our chirped-pulse data are collected using Kyle Crabtree’s blackchirp program, and often we apply a window function to the data. If you are looking at raw FFT data, PySpecTools provides access to window functions defined in scipy.signal, which you can access in a syntax like this:

session.apply_filter("hanning")

The full list of filters can be found in the SciPy documentation.

After pre-processing, we will perform peak detection and baseline correction. This is done using the session.find_peaks functionality, which automates several steps based on the keyword arguments. All of the analysis in PySpecTools is done preferably in units of signal-to-noise ratio (SNR), which is established by fitting a baseline (a vector, not scalar), and dividing the entire spectrum element-wise. SNR is definitely more meaningful than a raw voltage scale typically reported.

In the default way of peak finding, we use the asymmetric least-squares (ALS) method to fit a baseline (als=True). Essentially this can be thought of as a penalized least-squares method, with additional parameters that define how quickly the baseline can respond (you don’t want to over-subtract signal). These parameters can be accessed by providing find_peaks with keywords arguments (see documentation). The sigma keyword then specifies the minimum SNR value to use for peak finding; note that if als=False, threshold and sigma are equivalent. The former specifies the absolute intensity scale to use for peak finding.

[3]:

# Returns a pandas DataFrame containing frequency/intensity of
# every peak detected. This is also stored as an attribute;
# `AssignmentSession.peaks`
peaks = session.find_peaks(sigma=6, als=True)

[4]:

# Use the `describe` method of a `DataFrame` to summarize the
# peaks information
peaks.describe()

[4]:

	Frequency	Peak Frequencies	Intensity
count	447.000000	447.000000	447.000000
mean	12300.347573	12300.342431	28.373735
std	3365.905814	3365.901388	41.156217
min	6385.075006	6385.066667	6.002450
25%	9542.620545	9542.666667	8.248862
50%	12215.902539	12215.911111	14.821400
75%	14753.087387	14753.155555	32.619844
max	19845.175891	19845.155556	499.347482

In the cell below, we actually manually add some lines. Automated peak detection can never be perfect, especially with blended features. You can add frequency/intensity information by providing a list of 2-tuples as an argument to the add_ulines method:

[6]:

session.add_ulines(
    [
        (7483.911, 9.390),
        (8773.866, 12.523),
        (9200.000, 9.116),
        (9200.888, 9.442),
        (10258.311, 6.850),
        (10259.111, 6.948),
        (10262.044, 15.061),
        (10843.111, 9.215),
        (10928.266, 12.748),
        (10959.38, 14.302),
        (10978.93, 8.527),
        (10979.73, 7.273),
        (11454.844, 7.216),
        (11547.555, 7.485),
        (11548.000, 8.370),
        (11550.49, 7.134),
        (11561.51, 7.720),
        (11940.00, 6.039),
        (12476.444, 14.628),
        (12475.911, 13.628),
        (13558.40, 7.472),
        (13609.07, 6.087),
        (13751.378, 6.745),
        (13792.80, 9.937),
        (14839.64, 6.485),
        (14919.555, 17.971),
        (15248.177, 13.216),
        (15249.067, 15.414),
        (15557.60, 6.572),
        (16581.07, 7.550),
        (16706.76, 70.758),
        (16707.47, 49.851),
        (16710.67, 70.43661),
        (16711.47, 48.40109),
        (17115.02, 9.315)
    ]
)

Running assignments¶

With all the peaks found, we can start doing some assignments of the features! The main way this is done is by creating LineList objects, which are then fed to the session.process_linelist method as we shall see later.

There are different types of LineList objects, depending on the source of data:

from_artifacts
from_clock
from_catalog
from_pgopher
from_dataframe
from_lin
from_splatalogue_query
from_list

from_artifacts will create a specialized LineList that flags Transitions as non-molecular for book-keeping. from_clock is a special variant of this, where we have found that radio interference arising from arbitrary waveform generators often bleed into the resulting chirped-pulse spectrum, and exhaustively generates combinations/harmonics of the clock frequency as artifacts.

[7]:

artifacts = LineList.from_artifacts(
    [8000., 16000., 8125.,16250., 7065.7778, 7147.3778, 8574.9022]
)

With the artifacts variable/object, you can then pass it to the process_linelist method of our AssignmentSession, and it will automatically cross-correlate every unassigned (U-line) with entries contained in your LineList:

[8]:

session.process_linelist(linelist=artifacts)

For molecular assignments, you could of course repeat this process and manually create individual LineLists; in this example, we’ll take an SPCAT catalog and generate the LineList:

formaldehyde = LineList.from_catalog(name="formaldehyde", formula="H2CO", "catalogs/h2co.cat")

However, this is incredibly time consuming, and not pretty to look at (not to mention a nightmare to update). Instead, we recommend you set up a directory containing all of your catalogs, and create an input file that stores all of the metadata for the catalogs and “batch” process all of the catalogs. In the cell below, we automated the analysis of hydrocarbon molecules (separated oxygen- and nitrogen-bearing species) with a YAML file called hydrocarbons_cat.yml. YAML is a simple markup syntax that is both machine and human read/writeable. Below is a small excerpt of our file:

ethynylbenzene,v23:
  formula: c8h6
  filepath: h_catalogs/phenylacetylene_v23.cat

ethynylbenzene,2v23:
  formula: c8h6
  filepath: h_catalogs/phenylacetylene_2v23.cat

ethynylbenzene,v16:
  formula: c8h6
  filepath: h_catalogs/phenylacetylene_v16.cat

buta-1,3-diynylbenzene:
  formula: c10h6
  filepath: h_catalogs/phenyldiacetylene.cat

hexa-1,3,5-triynylbenzene:
  formula: c12h6
  filepath: h_catalogs/phenyltriacetylene.cat

You can actually provide the source keyword as well, and include a BibTeX citekey. When it comes to automatic report generation, the citation will be automatically used to streamline LaTeX table generation.

molecule_name:
  formula: C12H6                    # formula
  source: mccarthy_benzene_2020     # citekey
  filepath: catalog/molecule.cat    # filepath to the SPCAT catalog

[9]:

session.process_linelist_batch(yml_path="hydrocarbons_cat.yml")

Line list for: cyclopropa-1,2-diene,gs Formula: c3h2, Number of entries: 80


Line list for: ethynylbenzene Formula: c8h6, Number of entries: 144


Line list for: umol-1850 Formula: cxhy, Number of entries: 150


Line list for: ethynylbenzene,v23 Formula: c8h6, Number of entries: 374


Line list for: ethynylbenzene,2v23 Formula: c8h6, Number of entries: 374


Line list for: ethynylbenzene,v16 Formula: c8h6, Number of entries: 374


Line list for: buta-1,3-diynylbenzene Formula: c10h6, Number of entries: 745


Line list for: hexa-1,3,5-triynylbenzene Formula: c12h6, Number of entries: 229


Line list for: 5-ethylenecyclopenta-1,3-diene Formula: c6h6, Number of entries: 82


Line list for: 1-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 355


Line list for: 2-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 462


Line list for: cyclohexa-1,3-dien-5-yne Formula: c6h4, Number of entries: 187


Line list for: cyclohexa-1,3-dien-5-yne,2v16 Formula: c6h4, Number of entries: 77


Line list for: cyclohexa-1,3-dien-5-yne,v16 Formula: c6h4, Number of entries: 77


Line list for: cyclohexa-1,3-dien-5-yne,v15 Formula: c6h4, Number of entries: 77


Line list for: prop-1-yne Formula: c3h4, Number of entries: 4


Line list for: prop-1-yne,1v9 Formula: c3h4, Number of entries: 4


Line list for: prop-1-yne,1v10 Formula: c3h4, Number of entries: 4


Line list for: prop-1-yne,2v10 Formula: c3h4, Number of entries: 4


Line list for: penta-1,3-diyne Formula: c5h4, Number of entries: 28


Line list for: penta-1,3-diyne,1v11 Formula: c5h4, Number of entries: 16


Line list for: penta-1,3-diyne,1v12 Formula: c5h4, Number of entries: 16


Line list for: penta-1,3-diyne,1v13 Formula: c5h4, Number of entries: 16


Line list for: penta-1,3-diyne,ve1 Formula: c5h4, Number of entries: 28


Line list for: hepta-1,3,5-triyne Formula: c7h4, Number of entries: 79


Line list for: (2Z)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 308


Line list for: (2E)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 155


Line list for: but-1-en-3-yne Formula: c4h4, Number of entries: 23


Line list for: hex-1-ene-3,5-diyne Formula: c6h4, Number of entries: 97


Line list for: vinyl_triacetylene Formula: c8h4, Number of entries: 93


Line list for: 5-ethenylidenecyclopenta-1,3-diene Formula: c7h6, Number of entries: 226


Line list for: 5-ethenylidenecyclopenta-1,3-diene,v22 Formula: c7h6, Number of entries: 202


Line list for: cyclopenta-1,3-diene Formula: c5h6, Number of entries: 88


Line list for: (Z)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 172


Line list for: penta-1,2-dien-4-yne Formula: c5h4, Number of entries: 71


Line list for: hepta-1,2,3,4,5-pentaene-6-yne Formula: c7h4, Number of entries: 209


Line list for: cis-hex-ene-diyene Formula: c6h4, Number of entries: 273


Line list for: hexa-1,2,3-trien-5-yne Formula: c6h4, Number of entries: 232


Line list for: hepta-1,2-dien-4,6-diyne Formula: c7h4, Number of entries: 271


Line list for: cyclopropa_1_yne_3_yl_radical Formula: c3h, Number of entries: 1119


Line list for: cyclopropa-1-yne-3-yl_radical,ve1 Formula: c3h, Number of entries: 621


Line list for: cyclopropa-1-yne-3-yl_radical,ve2 Formula: c3h, Number of entries: 299


Line list for: cyclopropa-1-yne-3-yl_radical,ve3 Formula: c3h, Number of entries: 301


Line list for: buta-1,3-diynyl radical Formula: c4h, Number of entries: 102


Line list for: 1,2,3,4-pentatetraene-1,1,5-trienyl radical Formula: c5h, Number of entries: 93


Line list for: hexa-1,3,5-triynyl radical Formula: c6h, Number of entries: 176


Line list for: 1,2,3,4,5,6-heptahexaene-1,1,7-trienyl radical Formula: c7h, Number of entries: 154


Line list for: propadienylidene Formula: c3h2, Number of entries: 10


Line list for: butatrienylidene Formula: c4h2, Number of entries: 14


Line list for: pentatetraenylidene Formula: c5h2, Number of entries: 28


Line list for: 1-ethynyl-cycloprop-1-en-2-ylidene Formula: c5h2, Number of entries: 70


Line list for: penta-1,2-dien-4-yne-1-ylidene Formula: c5h2, Number of entries: 92


Line list for: cylcohexadiene Formula: c6h8, Number of entries: 69


Line list for: (4Z)-hepta-1,2,4-trien-6-yne (anti) Formula: c7h6, Number of entries: 492


Line list for: (E)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 87

We repreat the same procedure for a .lin file, which also follows SPFIT formatting. The from_XXX parser is chosen based on the extension of the referenced file.

[10]:

session.process_linelist_batch(yml_path="hydrocarbons_lin.yml")

Line list for: cyclopropa-1,2-diene,gs Formula: c3h2, Number of entries: 406


Line list for: umol-1850 Formula: cxhy, Number of entries: 24


Line list for: cyclopropa-1,2-diene (HC13CCH) Formula: c3h2, Number of entries: 6


Line list for: cyclopropa-1,2-diene (H13CCCH) Formula: c3h2, Number of entries: 12


Line list for: cyclopropa-1,2-diene,1v2 Formula: c3h2, Number of entries: 37


Line list for: cyclopropa-1,2-diene,1v3 Formula: c3h2, Number of entries: 35


Line list for: cyclopropa-1,2-diene,1v5 Formula: c3h2, Number of entries: 28


Line list for: cyclopropa-1,2-diene,1v6 Formula: c3h2, Number of entries: 38


Line list for: cyclopropa-1,2-diene,2v6 Formula: c3h2, Number of entries: 17


Line list for: cyclopropa-1,2-diene,3v6 Formula: c3h2, Number of entries: 5


Line list for: cyclopropa-1,2-diene,4v6 Formula: c3h2, Number of entries: 2


Line list for: cyclopropa-1,2-diene,1v5+1v6 Formula: c3h2, Number of entries: 2


Line list for: cyclopropa-1-yne-3-yl_radical,ve1 Formula: c3h, Number of entries: 22


Line list for: cyclopropa-1-yne-3-yl_radical,ve2 Formula: c3h, Number of entries: 5


Line list for: cyclopropa-1-yne-3-yl_radical,ve3 Formula: c3h, Number of entries: 11


Line list for: penta-1,3-diyne,ve2 Formula: c5h4, Number of entries: 9


Line list for: penta-1,3-diyne,ve3 Formula: c5h4, Number of entries: 9


Line list for: ethynylbenzene Formula: c8h6, Number of entries: 58


Line list for: ethynylbenzene,v23 Formula: c8h6, Number of entries: 35


Line list for: ethynylbenzene,2v23 Formula: c8h6, Number of entries: 11


Line list for: ethynylbenzene,v16 Formula: c8h6, Number of entries: 16


Line list for: buta_1,3_diynylbenzene Formula: c10h6, Number of entries: 86


Line list for: hexa_1,3,5_triynylbenzene Formula: c12h6, Number of entries: 25


Line list for: 5-ethylenecyclopenta-1,3-diene Formula: c6h6, Number of entries: 28


Line list for: 1-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 30


Line list for: 2-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 39


Line list for: hepta-1,3,5-triyne Formula: c7h4, Number of entries: 16


Line list for: (2Z)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 29


Line list for: (2E)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 32


Line list for: hex-1-ene-3,5-diyne Formula: c6h4, Number of entries: 22


Line list for: 5-ethenylidenecyclopenta-1,3-diene Formula: c7h6, Number of entries: 26


Line list for: 5-ethenylidenecyclopenta-1,3-diene,v22 Formula: c7h6, Number of entries: 18


Line list for: cyclopenta-1,3-diene Formula: c5h6, Number of entries: 19


Line list for: penta-1,2-dien-4-yne Formula: c5h4, Number of entries: 14


Line list for: hepta-1,2,3,4,5-pentaene-6-yne Formula: c7h4, Number of entries: 16


Line list for: hexa-1,2,3-trien-5-yne Formula: c6h4, Number of entries: 23


Line list for: hepta-1,2-dien-4,6-diyne Formula: c7h4, Number of entries: 45


Line list for: 1-ethynyl-cycloprop-1-en-2-ylidene Formula: c5h2, Number of entries: 13


Line list for: penta-1,2-dien-4-yne-1-ylidene Formula: c5h2, Number of entries: 13


Line list for: (4Z)-hepta-1,2,4-trien-6-yne (anti) Formula: c7h6, Number of entries: 34


Line list for: l_ccch,ve Formula: c3h, Number of entries: 32


Line list for: prop-1-yne,ve1 Formula: c3h4, Number of entries: 9


Line list for: prop-1-yne,ve2 Formula: c3h4, Number of entries: 9


Line list for: (E)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 20


Line list for: (E)-3-penten-1-yne, E state Formula: c5h6, Number of entries: 18


Line list for: (Z)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 13


Line list for: (Z)-3-penten-1-yne, E state Formula: c5h6, Number of entries: 9

Finishing the analysis¶

This basically completes the assignment process! We just have a few more steps to take to save the analysis; a Pickle file is saved to disk, which is then used for all the subsequent analysis (e.g. line profile, statistics). The session.finalize_assignments() is currently not as final as it sounds: it just prompts all the report and table generation to happen, as well as export all of the identified and unidentified data into respective folders.

[15]:

session.finalize_assignments()

The save_session function below then dumps the entire analysis into the folder sessions/{experiment_ID}.pkl, where {experiment_ID} is the number assigned to the experiment all the way at the beginning (experiment=4000).

[25]:

session.save_session()

You can then load this session back in in a separate notebook with AssignmentSession.load_session("sessions/{experiment_ID}.pkl")

[26]:

session = AssignmentSession.load_session("sessions/4000.pkl")

This loads in all of the information from before, including the results generated with finalize_assignments(). For example, the identifications attribute stores a dict which tracks each distinct species as keys, with the number of assigned lines as values:

[27]:

session.identifications

[27]:

{'buta-1,3-diynylbenzene': 51,
 'umol-1999': 1,
 '2-ethynylcyclopenta-1,3-diene': 17,
 '5-ethenylidenecyclopenta-1,3-diene': 16,
 '1-ethynyl-cycloprop-1-en-2-ylidene': 6,
 '1-ethynylcyclopenta-1,3-diene': 18,
 '5-ethenylidenecyclopenta-1,3-diene,v22': 8,
 '(2Z)-hexa-1,3-dien-5-yne (anti)': 10,
 'umol-1850': 5,
 '1,2,3,4-pentatetraene-1,1,5-trienyl radical': 9,
 'hepta-1,2,3,4,5-pentaene-6-yne': 10,
 'hepta-1,2-dien-4,6-diyne': 13,
 'hexa-1,3,5-triynylbenzene': 3,
 'ethynylbenzene': 19,
 'ethynylbenzene,v23': 18,
 'benzonitrile, v21': 7,
 'hepta-1,3,5-triyne': 5,
 '1,2,3,4,5,6-heptahexaene-1,1,7-trienyl radical': 5,
 '2-phenylacetonitrile': 2,
 'vinyl_triacetylene': 1,
 'hex-1-ene-3,5-diyne': 10,
 'cyclohexa-1,3-dien-5-yne': 4,
 'cyclohexa-2,4-dien-1-one': 9,
 'Artifact': 2,
 'ethynylbenzene,v16': 6,
 'ethynylbenzene,2v23': 5,
 'penta-1,3-diyne': 3,
 'penta-1,3-diyne,1v12': 6,
 'penta-1,3-diyne,ve2': 3,
 '(2E)-hexa-1,3-dien-5-yne (anti)': 13,
 '(4Z)-hepta-1,2,4-trien-6-yne (anti)': 9,
 '(E)-pent-2-en-4-ynenitrile': 1,
 'benzonitrile': 7,
 'hexa-1,2,3-trien-5-yne': 3,
 'butatrienylidene': 4,
 'but-1-en-3-yne': 2,
 'prop-2-ynenitrile': 2,
 'hexa-4,5-dien-2-ynenitrile': 2,
 'pentatetraenylidene': 6,
 'prop-2-ynal': 2,
 'prop-2-enenitrile': 2,
 'buta-1,3-diynyl radical': 3,
 '3-phenylprop-2-ynenitrile': 1,
 '3-oxo-1,2-propadienylidene': 2,
 'cyclohexa-2,5-dien-1-one': 11,
 'penta-1,2-dien-4-yne': 6,
 'cyclohexa-1,3-dien-5-yne,2v16': 2,
 'cyanoprop-1,2-dien-1,3-diyl': 1,
 'cyanoacetyl-cycloprop-1-ene-2,2-diyl': 1,
 'penta-2,4-diynal': 3,
 'penta-2,4-diynenitrile': 3,
 '5-ethylenecyclopenta-1,3-diene': 3,
 'cyclopenta-2,4-dien-1-one': 2,
 'penta-1,3-diyne,1v11': 2,
 'penta-1,3-diyne,ve1': 1,
 'penta-1,3-diyne,ve3': 1,
 'penta-1,3-diyne,1v13': 1,
 '(Z)-3-penten-1-yne, A state': 1,
 '(Z)-3-penten-1-yne, E state': 1,
 'cyclopenta-2,4-dien-1-one, ve1': 1,
 'cis-hex-ene-diyene': 1,
 'cylcohexadiene': 1,
 '(E)-3-penten-1-yne, A state': 1,
 'buta-2,3-dien-1-imine (syn)': 1,
 'cyclopenta-1,3-diene-1-carbonitrile': 1,
 'cyclopenta-2,4-diene-1-carbonitrile': 1,
 'benzonitrile, v12': 1,
 'cyclopropa-1-yne-3-yl_radical,ve2': 2,
 'cyclopropa_1_yne_3_yl_radical': 12,
 'cyclopropa-1-yne-3-yl_radical,ve1': 4,
 'cyclohexa-1,3-dien-5-yne,v15': 1,
 'cyclohexa-1,3-dien-5-yne,v16': 1,
 'cyclopropa-1-yne-3-yl_radical,ve3': 2,
 'buta-2,3-dienenitrile': 1,
 'cyclopropa-1,2-diene,4v6': 1,
 'cyclopropa-1,2-diene,3v6': 1,
 'but-3-enenitrile (cis)': 1,
 'penta-1,2-dien-4-yne-1-ylidene': 2,
 'cyclopropa-1,2-diene,2v6': 1,
 'cyclopropa-1,2-diene (HC13CCH)': 1,
 'prop-1-yne,ve1': 1,
 '(E)-but-2-enal (anti)': 1,
 'prop-1-yne': 1,
 'prop-1-yne,1v9': 1,
 'cyclopropa-1,2-diene,1v5+1v6': 1,
 'cyclopropa-1,2-diene,1v6': 1,
 'penta-1,2-dien-1-one-3-yl radical': 1,
 'cyclopropa-1,2-diene,1v3': 1,
 'cyclopropa-1,2-diene,1v2': 1,
 'cyclopropa-1,2-diene,gs': 1,
 'cyclopropa-1,2-diene (H13CCCH)': 1,
 'benzonitrile, v15': 1,
 '(2E)-2,4-pentadienal (syn)': 1,
 'cyanopenta-2,4-diyne-2,2-diyl': 1,
 'c3s': 1,
 'hexa-1,3,5-triynyl radical': 4,
 'prop-1-yne,ve2': 1}

You can also view all of the assignment information by accessing the DataFrame stored as the table attribute. Below, we also demonstrate how we can sort columns based on their values, for example looking at the transitions with the highest catalog uncertainty first.

[28]:

session.table.sort_values(["uncertainty"], ascending=False)

[28]:

	name	smiles	formula	frequency	catalog_frequency	catalog_intensity	deviation	intensity	uncertainty	S	...	lstate_energy	interference	weighting	source	public	velocity	discharge	magnet	multiple	final
40	2-phenylacetonitrile		c8h7n	7946.962728	7947.1695	-5.3480	0.206772	7.765701	0.1409	0.0	...	6.2021	False	0.0	Catalog	True	0.0	False	False	[penta-2,4-diynal, 7,947.0717]	False
385	benzonitrile, v15		c7h5n	18425.935412	18426.1108	-5.6658	0.175388	6.628221	0.1065	0.0	...	3.9995	False	0.0	Catalog	True	0.0	False	False	[cyanoprop-1,2-dien-1,3-diyl, 18,426.1726, 3-p...	False
353	but-3-enenitrile (cis)		c4h5n	16331.429075	16331.4637	-5.3417	0.034625	38.524312	0.0772	0.0	...	10.8664	False	0.0	Catalog	True	0.0	False	False	[cyclohexa-2,4-dien-1-one, 16,331.4332, ethyny...	False
126	cyanoacetyl-cycloprop-1-ene-2,2-diyl		c6hn	10379.895776	10379.8652	-3.6143	-0.030576	8.086545	0.0344	0.0	...	1.7845	False	0.0	Catalog	True	0.0	False	False	[]	True
179	hexa-1,3,5-triynylbenzene		c12h6	11803.740216	11803.8139	-3.8298	0.073684	18.036535	0.0279	0.0	...	5.7546	False	0.0	Catalog	True	0.0	False	False	[hexa-1,2,3-trien-5-yne, 11,803.7367, (Z)-but-...	False
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
43	cyclohexa-1,3-dien-5-yne		c6h4	8073.195789	8073.1768	-2.7170	-0.018989	14.487425	0.0000	0.0	...	0.8300	False	0.0	Catalog	True	0.0	False	False	[]	True
39	vinyl_triacetylene		c8h4	7924.848564	7924.8884	-2.7370	0.039836	7.026208	0.0000	0.0	...	0.7930	False	0.0	Catalog	True	0.0	False	False	[]	True
367	prop-1-yne		c3h4	17091.744758	17091.7420	-1.5739	-0.002758	140.717850	0.0000	0.0	...	0.0000	False	0.0	Catalog	True	0.0	False	False	[]	True
368	prop-1-yne,1v9		c3h4	17102.103117	17102.0765	-1.5722	-0.026617	31.469987	0.0000	0.0	...	0.0000	False	0.0	Catalog	True	0.0	False	False	[benzonitrile, v15, 17,101.9416]	True
314	cyclopropa_1_yne_3_yl_radical		c3h	14893.034757	14893.0554	-1.6790	0.020643	123.717808	0.0000	0.0	...	0.6391	False	0.0	Catalog	True	0.0	False	False	[]	True

428 rows × 28 columns

When it comes to making plots, we might also be interested in removing the features that have already been assigned from X/Y; the clean_spectral_assignments() function replaces regions of the spectrum that have been assigned with white noise, to make it look natural.

[32]:

session.clean_spectral_assignments()

You can then plot the cleaned spectrum, where all of the assigned features are removed from the spectrum with plot_assigned(). This creates a plotly figure which is interactive!

Note that the plot_assigned() function can be used at any point of notebook too; the latest spectrum with assignments overlaid will be shown.

[33]:

session.plot_assigned()

Conclusions¶

This notebook completes the first analysis step, which is often the most tedious: assigning and keeping track of every spectral feature, and translating that into something that is publishable. We went through how a spectrum can be loaded and interfaced with the AssignmentSession class in PySpecTools, followed by peak finding. We then created LineList objects based on SPCAT catalogs, and fed them to the AssignmentSession to process, and showed that you could do this en masse. Finally, the results of the analysis are saved to disk, and generating an interactive report.

In a future notebook, we’ll take a look at what kind of things we can do with the saved AssignmentSession, for example chemical composition analysis, and making plots of the data for publication.

Analyzing Broadband Spectra with the assignment Module¶

Introduction¶

Running assignments¶

Finishing the analysis¶

Conclusions¶

Analyzing Broadband Spectra with the `assignment` Module¶