Analyzing Broadband Spectra with the assignment Module

Introduction

In this notebook, we’re going to work through how the core functionality of PySpecTools can be used to streamline and automate your spectral analysis. It’s worth noting that PySpecTools and Python provide enough flexibility for you to adjust to your needs; whatever can’t be done with PySpecTools natively could be automated with Python (e.g. for loops) and to a large extent pandas as well. In the latter case, particularly when you’re analyzing the assignments, and looking to filter out certain molecules, etc. This may be left for a subsequent notebook as the focus of this notebook is to demonstrate how automated assignment is performed.

The core functionality of assigning spectra revolves around the pyspectools.spectra.assignment module, and contains three main abstractions:

  1. AssignmentSession

    • This is your main interface: holds the spectral data, and allows you to interact (plot, assign, etc) with the data.

  2. Transition

    • Represents every type of spectral feature: every peak in an experiment, and every catalog entry.

  3. LineList

    • A collection of spectral features: the peaks in an experiment (which in themselves are Transition objects), and catalogs.

We will demonstrate how these pieces come together by looking at some of our published data: this notebook was used to analyze the Benzene discharge experiments reported in these two papers:

McCarthy, M. C.; Lee, K. L. K.; Carroll, P. B.; Porterfield, J. P.; Changala, P. B.; Thorpe, J. H.; Stanton, J. F. Exhaustive Product Analysis of Three Benzene Discharges by Microwave Spectroscopy. J. Phys. Chem. A 2020, 124 (25), 5170–5181. https://doi.org/10.1021/acs.jpca.0c02919.

Lee, K. L. K.; McCarthy, M. Study of Benzene Fragmentation, Isomerization, and Growth Using Microwave Spectroscopy. J. Phys. Chem. Lett. 2019, 10 (10), 2408–2413. https://doi.org/10.1021/acs.jpclett.9b00586.

The full dataset can also be found on our Zenodo repository; notebook “4000” most closely resembles this (this is a much more heavily marked up version).

We should stress that, while this is mostly automated, it does not change the fact that spectral analysis is very much an iterative process. You will make modifications to the way you do your analysis, and many things you won’t know until you’ve run it at least once. The point of having this notebook is so that it is reproducible and transparent: you can always modify the code and re-run the whole notebook with the latest analysis.

To begin the analysis, we will construct an AssignmentSession object using the class method, AssignmentSession.from_ascii(...). This method will take your ASCII spectrum containing frequency and intensity information, and parse it using pandas and store it as a DataFrame. With all Python routines, you can call the function/method with a question mark at the end to pull up the documentation associated with that function/method:

[2]:
from pyspectools.spectra.assignment import AssignmentSession, LineList

In this case, we’re setting up the session based on the Benzene data, which is a tab-delimited text file with a header. We ignore the header with skiprows=, and provide our own column names with the col_names argument. Additionally, we’re going to specify the composition we expect for the experiment with the composition kwarg: ideally we would only include ["C", "H"], however we know there are atmospheric impurities like nitrogen and oxygen that get incorporated in the discharge products. This keyword will affect Splatalogue assignments, and exclude catalogs that contain irrelevant compositions like metal-bearing molecules.

[2]:
session = AssignmentSession.from_ascii(
    "chirp_data/ft2632_hanning_620.txt",
    experiment=4000,
    col_names=["Frequency", "Intensity"],
    skiprows=1,
    composition=["C", "H", "N", "O"],
    verbose=False
)

You can also adjust many of these settings after the fact, which are stored as attributes of the Session object within an AssignmentSession. For example, the temperature attribute will set an upper limit to the lower state energies of states assignable: we will ignore all features that are double this specified energy. This isn’t the direct threshold, because it nominally corresponds to what your experimental temperature is, and depending on how prominent molecule is, you may see higher temperature transitions. Another useful thing to set is the maximum tolerance for uncertainty in catalog entries: we would like to reject assignments based on poorly predicted lines, which is set by the max_uncertainty attribute.

[ ]:
# temperature in K
session.session.temperature = 10.

# uncertainty in MHz
session.session.max_uncertainty = 0.2

Note that frequency units are in MHz, and temperature in kelvin.

The next step is to pre-process the spectrum. Our chirped-pulse data are collected using Kyle Crabtree’s blackchirp program, and often we apply a window function to the data. If you are looking at raw FFT data, PySpecTools provides access to window functions defined in scipy.signal, which you can access in a syntax like this:

session.apply_filter("hanning")

The full list of filters can be found in the SciPy documentation.

After pre-processing, we will perform peak detection and baseline correction. This is done using the session.find_peaks functionality, which automates several steps based on the keyword arguments. All of the analysis in PySpecTools is done preferably in units of signal-to-noise ratio (SNR), which is established by fitting a baseline (a vector, not scalar), and dividing the entire spectrum element-wise. SNR is definitely more meaningful than a raw voltage scale typically reported.

In the default way of peak finding, we use the asymmetric least-squares (ALS) method to fit a baseline (als=True). Essentially this can be thought of as a penalized least-squares method, with additional parameters that define how quickly the baseline can respond (you don’t want to over-subtract signal). These parameters can be accessed by providing find_peaks with keywords arguments (see documentation). The sigma keyword then specifies the minimum SNR value to use for peak finding; note that if als=False, threshold and sigma are equivalent. The former specifies the absolute intensity scale to use for peak finding.

[3]:
# Returns a pandas DataFrame containing frequency/intensity of
# every peak detected. This is also stored as an attribute;
# `AssignmentSession.peaks`
peaks = session.find_peaks(sigma=6, als=True)
[4]:
# Use the `describe` method of a `DataFrame` to summarize the
# peaks information
peaks.describe()
[4]:
Frequency Peak Frequencies Intensity
count 447.000000 447.000000 447.000000
mean 12300.347573 12300.342431 28.373735
std 3365.905814 3365.901388 41.156217
min 6385.075006 6385.066667 6.002450
25% 9542.620545 9542.666667 8.248862
50% 12215.902539 12215.911111 14.821400
75% 14753.087387 14753.155555 32.619844
max 19845.175891 19845.155556 499.347482

In the cell below, we actually manually add some lines. Automated peak detection can never be perfect, especially with blended features. You can add frequency/intensity information by providing a list of 2-tuples as an argument to the add_ulines method:

[6]:
session.add_ulines(
    [
        (7483.911, 9.390),
        (8773.866, 12.523),
        (9200.000, 9.116),
        (9200.888, 9.442),
        (10258.311, 6.850),
        (10259.111, 6.948),
        (10262.044, 15.061),
        (10843.111, 9.215),
        (10928.266, 12.748),
        (10959.38, 14.302),
        (10978.93, 8.527),
        (10979.73, 7.273),
        (11454.844, 7.216),
        (11547.555, 7.485),
        (11548.000, 8.370),
        (11550.49, 7.134),
        (11561.51, 7.720),
        (11940.00, 6.039),
        (12476.444, 14.628),
        (12475.911, 13.628),
        (13558.40, 7.472),
        (13609.07, 6.087),
        (13751.378, 6.745),
        (13792.80, 9.937),
        (14839.64, 6.485),
        (14919.555, 17.971),
        (15248.177, 13.216),
        (15249.067, 15.414),
        (15557.60, 6.572),
        (16581.07, 7.550),
        (16706.76, 70.758),
        (16707.47, 49.851),
        (16710.67, 70.43661),
        (16711.47, 48.40109),
        (17115.02, 9.315)
    ]
)

Running assignments

With all the peaks found, we can start doing some assignments of the features! The main way this is done is by creating LineList objects, which are then fed to the session.process_linelist method as we shall see later.

There are different types of LineList objects, depending on the source of data:

  1. from_artifacts

  2. from_clock

  3. from_catalog

  4. from_pgopher

  5. from_dataframe

  6. from_lin

  7. from_splatalogue_query

  8. from_list

from_artifacts will create a specialized LineList that flags Transitions as non-molecular for book-keeping. from_clock is a special variant of this, where we have found that radio interference arising from arbitrary waveform generators often bleed into the resulting chirped-pulse spectrum, and exhaustively generates combinations/harmonics of the clock frequency as artifacts.

[7]:
artifacts = LineList.from_artifacts(
    [8000., 16000., 8125.,16250., 7065.7778, 7147.3778, 8574.9022]
)

With the artifacts variable/object, you can then pass it to the process_linelist method of our AssignmentSession, and it will automatically cross-correlate every unassigned (U-line) with entries contained in your LineList:

[8]:
session.process_linelist(linelist=artifacts)

For molecular assignments, you could of course repeat this process and manually create individual LineLists; in this example, we’ll take an SPCAT catalog and generate the LineList:

formaldehyde = LineList.from_catalog(name="formaldehyde", formula="H2CO", "catalogs/h2co.cat")

However, this is incredibly time consuming, and not pretty to look at (not to mention a nightmare to update). Instead, we recommend you set up a directory containing all of your catalogs, and create an input file that stores all of the metadata for the catalogs and “batch” process all of the catalogs. In the cell below, we automated the analysis of hydrocarbon molecules (separated oxygen- and nitrogen-bearing species) with a YAML file called hydrocarbons_cat.yml. YAML is a simple markup syntax that is both machine and human read/writeable. Below is a small excerpt of our file:

ethynylbenzene,v23:
  formula: c8h6
  filepath: h_catalogs/phenylacetylene_v23.cat

ethynylbenzene,2v23:
  formula: c8h6
  filepath: h_catalogs/phenylacetylene_2v23.cat

ethynylbenzene,v16:
  formula: c8h6
  filepath: h_catalogs/phenylacetylene_v16.cat

buta-1,3-diynylbenzene:
  formula: c10h6
  filepath: h_catalogs/phenyldiacetylene.cat

hexa-1,3,5-triynylbenzene:
  formula: c12h6
  filepath: h_catalogs/phenyltriacetylene.cat

You can actually provide the source keyword as well, and include a BibTeX citekey. When it comes to automatic report generation, the citation will be automatically used to streamline LaTeX table generation.

molecule_name:
  formula: C12H6                    # formula
  source: mccarthy_benzene_2020     # citekey
  filepath: catalog/molecule.cat    # filepath to the SPCAT catalog
[9]:
session.process_linelist_batch(yml_path="hydrocarbons_cat.yml")
Line list for: cyclopropa-1,2-diene,gs Formula: c3h2, Number of entries: 80

Line list for: ethynylbenzene Formula: c8h6, Number of entries: 144

Line list for: umol-1850 Formula: cxhy, Number of entries: 150

Line list for: ethynylbenzene,v23 Formula: c8h6, Number of entries: 374

Line list for: ethynylbenzene,2v23 Formula: c8h6, Number of entries: 374

Line list for: ethynylbenzene,v16 Formula: c8h6, Number of entries: 374

Line list for: buta-1,3-diynylbenzene Formula: c10h6, Number of entries: 745

Line list for: hexa-1,3,5-triynylbenzene Formula: c12h6, Number of entries: 229

Line list for: 5-ethylenecyclopenta-1,3-diene Formula: c6h6, Number of entries: 82

Line list for: 1-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 355

Line list for: 2-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 462

Line list for: cyclohexa-1,3-dien-5-yne Formula: c6h4, Number of entries: 187

Line list for: cyclohexa-1,3-dien-5-yne,2v16 Formula: c6h4, Number of entries: 77

Line list for: cyclohexa-1,3-dien-5-yne,v16 Formula: c6h4, Number of entries: 77

Line list for: cyclohexa-1,3-dien-5-yne,v15 Formula: c6h4, Number of entries: 77

Line list for: prop-1-yne Formula: c3h4, Number of entries: 4

Line list for: prop-1-yne,1v9 Formula: c3h4, Number of entries: 4

Line list for: prop-1-yne,1v10 Formula: c3h4, Number of entries: 4

Line list for: prop-1-yne,2v10 Formula: c3h4, Number of entries: 4

Line list for: penta-1,3-diyne Formula: c5h4, Number of entries: 28

Line list for: penta-1,3-diyne,1v11 Formula: c5h4, Number of entries: 16

Line list for: penta-1,3-diyne,1v12 Formula: c5h4, Number of entries: 16

Line list for: penta-1,3-diyne,1v13 Formula: c5h4, Number of entries: 16

Line list for: penta-1,3-diyne,ve1 Formula: c5h4, Number of entries: 28

Line list for: hepta-1,3,5-triyne Formula: c7h4, Number of entries: 79

Line list for: (2Z)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 308

Line list for: (2E)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 155

Line list for: but-1-en-3-yne Formula: c4h4, Number of entries: 23

Line list for: hex-1-ene-3,5-diyne Formula: c6h4, Number of entries: 97

Line list for: vinyl_triacetylene Formula: c8h4, Number of entries: 93

Line list for: 5-ethenylidenecyclopenta-1,3-diene Formula: c7h6, Number of entries: 226

Line list for: 5-ethenylidenecyclopenta-1,3-diene,v22 Formula: c7h6, Number of entries: 202

Line list for: cyclopenta-1,3-diene Formula: c5h6, Number of entries: 88

Line list for: (Z)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 172

Line list for: penta-1,2-dien-4-yne Formula: c5h4, Number of entries: 71

Line list for: hepta-1,2,3,4,5-pentaene-6-yne Formula: c7h4, Number of entries: 209

Line list for: cis-hex-ene-diyene Formula: c6h4, Number of entries: 273

Line list for: hexa-1,2,3-trien-5-yne Formula: c6h4, Number of entries: 232

Line list for: hepta-1,2-dien-4,6-diyne Formula: c7h4, Number of entries: 271

Line list for: cyclopropa_1_yne_3_yl_radical Formula: c3h, Number of entries: 1119

Line list for: cyclopropa-1-yne-3-yl_radical,ve1 Formula: c3h, Number of entries: 621

Line list for: cyclopropa-1-yne-3-yl_radical,ve2 Formula: c3h, Number of entries: 299

Line list for: cyclopropa-1-yne-3-yl_radical,ve3 Formula: c3h, Number of entries: 301

Line list for: buta-1,3-diynyl radical Formula: c4h, Number of entries: 102

Line list for: 1,2,3,4-pentatetraene-1,1,5-trienyl radical Formula: c5h, Number of entries: 93

Line list for: hexa-1,3,5-triynyl radical Formula: c6h, Number of entries: 176

Line list for: 1,2,3,4,5,6-heptahexaene-1,1,7-trienyl radical Formula: c7h, Number of entries: 154

Line list for: propadienylidene Formula: c3h2, Number of entries: 10

Line list for: butatrienylidene Formula: c4h2, Number of entries: 14

Line list for: pentatetraenylidene Formula: c5h2, Number of entries: 28

Line list for: 1-ethynyl-cycloprop-1-en-2-ylidene Formula: c5h2, Number of entries: 70

Line list for: penta-1,2-dien-4-yne-1-ylidene Formula: c5h2, Number of entries: 92

Line list for: cylcohexadiene Formula: c6h8, Number of entries: 69

Line list for: (4Z)-hepta-1,2,4-trien-6-yne (anti) Formula: c7h6, Number of entries: 492

Line list for: (E)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 87

We repreat the same procedure for a .lin file, which also follows SPFIT formatting. The from_XXX parser is chosen based on the extension of the referenced file.

[10]:
session.process_linelist_batch(yml_path="hydrocarbons_lin.yml")
Line list for: cyclopropa-1,2-diene,gs Formula: c3h2, Number of entries: 406

Line list for: umol-1850 Formula: cxhy, Number of entries: 24

Line list for: cyclopropa-1,2-diene (HC13CCH) Formula: c3h2, Number of entries: 6

Line list for: cyclopropa-1,2-diene (H13CCCH) Formula: c3h2, Number of entries: 12

Line list for: cyclopropa-1,2-diene,1v2 Formula: c3h2, Number of entries: 37

Line list for: cyclopropa-1,2-diene,1v3 Formula: c3h2, Number of entries: 35

Line list for: cyclopropa-1,2-diene,1v5 Formula: c3h2, Number of entries: 28

Line list for: cyclopropa-1,2-diene,1v6 Formula: c3h2, Number of entries: 38

Line list for: cyclopropa-1,2-diene,2v6 Formula: c3h2, Number of entries: 17

Line list for: cyclopropa-1,2-diene,3v6 Formula: c3h2, Number of entries: 5

Line list for: cyclopropa-1,2-diene,4v6 Formula: c3h2, Number of entries: 2

Line list for: cyclopropa-1,2-diene,1v5+1v6 Formula: c3h2, Number of entries: 2

Line list for: cyclopropa-1-yne-3-yl_radical,ve1 Formula: c3h, Number of entries: 22

Line list for: cyclopropa-1-yne-3-yl_radical,ve2 Formula: c3h, Number of entries: 5

Line list for: cyclopropa-1-yne-3-yl_radical,ve3 Formula: c3h, Number of entries: 11

Line list for: penta-1,3-diyne,ve2 Formula: c5h4, Number of entries: 9

Line list for: penta-1,3-diyne,ve3 Formula: c5h4, Number of entries: 9

Line list for: ethynylbenzene Formula: c8h6, Number of entries: 58

Line list for: ethynylbenzene,v23 Formula: c8h6, Number of entries: 35

Line list for: ethynylbenzene,2v23 Formula: c8h6, Number of entries: 11

Line list for: ethynylbenzene,v16 Formula: c8h6, Number of entries: 16

Line list for: buta_1,3_diynylbenzene Formula: c10h6, Number of entries: 86

Line list for: hexa_1,3,5_triynylbenzene Formula: c12h6, Number of entries: 25

Line list for: 5-ethylenecyclopenta-1,3-diene Formula: c6h6, Number of entries: 28

Line list for: 1-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 30

Line list for: 2-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 39

Line list for: hepta-1,3,5-triyne Formula: c7h4, Number of entries: 16

Line list for: (2Z)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 29

Line list for: (2E)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 32

Line list for: hex-1-ene-3,5-diyne Formula: c6h4, Number of entries: 22

Line list for: 5-ethenylidenecyclopenta-1,3-diene Formula: c7h6, Number of entries: 26

Line list for: 5-ethenylidenecyclopenta-1,3-diene,v22 Formula: c7h6, Number of entries: 18

Line list for: cyclopenta-1,3-diene Formula: c5h6, Number of entries: 19

Line list for: penta-1,2-dien-4-yne Formula: c5h4, Number of entries: 14

Line list for: hepta-1,2,3,4,5-pentaene-6-yne Formula: c7h4, Number of entries: 16

Line list for: hexa-1,2,3-trien-5-yne Formula: c6h4, Number of entries: 23

Line list for: hepta-1,2-dien-4,6-diyne Formula: c7h4, Number of entries: 45

Line list for: 1-ethynyl-cycloprop-1-en-2-ylidene Formula: c5h2, Number of entries: 13

Line list for: penta-1,2-dien-4-yne-1-ylidene Formula: c5h2, Number of entries: 13

Line list for: (4Z)-hepta-1,2,4-trien-6-yne (anti) Formula: c7h6, Number of entries: 34

Line list for: l_ccch,ve Formula: c3h, Number of entries: 32

Line list for: prop-1-yne,ve1 Formula: c3h4, Number of entries: 9

Line list for: prop-1-yne,ve2 Formula: c3h4, Number of entries: 9

Line list for: (E)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 20

Line list for: (E)-3-penten-1-yne, E state Formula: c5h6, Number of entries: 18

Line list for: (Z)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 13

Line list for: (Z)-3-penten-1-yne, E state Formula: c5h6, Number of entries: 9

Finishing the analysis

This basically completes the assignment process! We just have a few more steps to take to save the analysis; a Pickle file is saved to disk, which is then used for all the subsequent analysis (e.g. line profile, statistics). The session.finalize_assignments() is currently not as final as it sounds: it just prompts all the report and table generation to happen, as well as export all of the identified and unidentified data into respective folders.

[15]:
session.finalize_assignments()

The save_session function below then dumps the entire analysis into the folder sessions/{experiment_ID}.pkl, where {experiment_ID} is the number assigned to the experiment all the way at the beginning (experiment=4000).

[25]:
session.save_session()

You can then load this session back in in a separate notebook with AssignmentSession.load_session("sessions/{experiment_ID}.pkl")

[26]:
session = AssignmentSession.load_session("sessions/4000.pkl")

This loads in all of the information from before, including the results generated with finalize_assignments(). For example, the identifications attribute stores a dict which tracks each distinct species as keys, with the number of assigned lines as values:

[27]:
session.identifications
[27]:
{'buta-1,3-diynylbenzene': 51,
 'umol-1999': 1,
 '2-ethynylcyclopenta-1,3-diene': 17,
 '5-ethenylidenecyclopenta-1,3-diene': 16,
 '1-ethynyl-cycloprop-1-en-2-ylidene': 6,
 '1-ethynylcyclopenta-1,3-diene': 18,
 '5-ethenylidenecyclopenta-1,3-diene,v22': 8,
 '(2Z)-hexa-1,3-dien-5-yne (anti)': 10,
 'umol-1850': 5,
 '1,2,3,4-pentatetraene-1,1,5-trienyl radical': 9,
 'hepta-1,2,3,4,5-pentaene-6-yne': 10,
 'hepta-1,2-dien-4,6-diyne': 13,
 'hexa-1,3,5-triynylbenzene': 3,
 'ethynylbenzene': 19,
 'ethynylbenzene,v23': 18,
 'benzonitrile, v21': 7,
 'hepta-1,3,5-triyne': 5,
 '1,2,3,4,5,6-heptahexaene-1,1,7-trienyl radical': 5,
 '2-phenylacetonitrile': 2,
 'vinyl_triacetylene': 1,
 'hex-1-ene-3,5-diyne': 10,
 'cyclohexa-1,3-dien-5-yne': 4,
 'cyclohexa-2,4-dien-1-one': 9,
 'Artifact': 2,
 'ethynylbenzene,v16': 6,
 'ethynylbenzene,2v23': 5,
 'penta-1,3-diyne': 3,
 'penta-1,3-diyne,1v12': 6,
 'penta-1,3-diyne,ve2': 3,
 '(2E)-hexa-1,3-dien-5-yne (anti)': 13,
 '(4Z)-hepta-1,2,4-trien-6-yne (anti)': 9,
 '(E)-pent-2-en-4-ynenitrile': 1,
 'benzonitrile': 7,
 'hexa-1,2,3-trien-5-yne': 3,
 'butatrienylidene': 4,
 'but-1-en-3-yne': 2,
 'prop-2-ynenitrile': 2,
 'hexa-4,5-dien-2-ynenitrile': 2,
 'pentatetraenylidene': 6,
 'prop-2-ynal': 2,
 'prop-2-enenitrile': 2,
 'buta-1,3-diynyl radical': 3,
 '3-phenylprop-2-ynenitrile': 1,
 '3-oxo-1,2-propadienylidene': 2,
 'cyclohexa-2,5-dien-1-one': 11,
 'penta-1,2-dien-4-yne': 6,
 'cyclohexa-1,3-dien-5-yne,2v16': 2,
 'cyanoprop-1,2-dien-1,3-diyl': 1,
 'cyanoacetyl-cycloprop-1-ene-2,2-diyl': 1,
 'penta-2,4-diynal': 3,
 'penta-2,4-diynenitrile': 3,
 '5-ethylenecyclopenta-1,3-diene': 3,
 'cyclopenta-2,4-dien-1-one': 2,
 'penta-1,3-diyne,1v11': 2,
 'penta-1,3-diyne,ve1': 1,
 'penta-1,3-diyne,ve3': 1,
 'penta-1,3-diyne,1v13': 1,
 '(Z)-3-penten-1-yne, A state': 1,
 '(Z)-3-penten-1-yne, E state': 1,
 'cyclopenta-2,4-dien-1-one, ve1': 1,
 'cis-hex-ene-diyene': 1,
 'cylcohexadiene': 1,
 '(E)-3-penten-1-yne, A state': 1,
 'buta-2,3-dien-1-imine (syn)': 1,
 'cyclopenta-1,3-diene-1-carbonitrile': 1,
 'cyclopenta-2,4-diene-1-carbonitrile': 1,
 'benzonitrile, v12': 1,
 'cyclopropa-1-yne-3-yl_radical,ve2': 2,
 'cyclopropa_1_yne_3_yl_radical': 12,
 'cyclopropa-1-yne-3-yl_radical,ve1': 4,
 'cyclohexa-1,3-dien-5-yne,v15': 1,
 'cyclohexa-1,3-dien-5-yne,v16': 1,
 'cyclopropa-1-yne-3-yl_radical,ve3': 2,
 'buta-2,3-dienenitrile': 1,
 'cyclopropa-1,2-diene,4v6': 1,
 'cyclopropa-1,2-diene,3v6': 1,
 'but-3-enenitrile (cis)': 1,
 'penta-1,2-dien-4-yne-1-ylidene': 2,
 'cyclopropa-1,2-diene,2v6': 1,
 'cyclopropa-1,2-diene (HC13CCH)': 1,
 'prop-1-yne,ve1': 1,
 '(E)-but-2-enal (anti)': 1,
 'prop-1-yne': 1,
 'prop-1-yne,1v9': 1,
 'cyclopropa-1,2-diene,1v5+1v6': 1,
 'cyclopropa-1,2-diene,1v6': 1,
 'penta-1,2-dien-1-one-3-yl radical': 1,
 'cyclopropa-1,2-diene,1v3': 1,
 'cyclopropa-1,2-diene,1v2': 1,
 'cyclopropa-1,2-diene,gs': 1,
 'cyclopropa-1,2-diene (H13CCCH)': 1,
 'benzonitrile, v15': 1,
 '(2E)-2,4-pentadienal (syn)': 1,
 'cyanopenta-2,4-diyne-2,2-diyl': 1,
 'c3s': 1,
 'hexa-1,3,5-triynyl radical': 4,
 'prop-1-yne,ve2': 1}

You can also view all of the assignment information by accessing the DataFrame stored as the table attribute. Below, we also demonstrate how we can sort columns based on their values, for example looking at the transitions with the highest catalog uncertainty first.

[28]:
session.table.sort_values(["uncertainty"], ascending=False)
[28]:
name smiles formula frequency catalog_frequency catalog_intensity deviation intensity uncertainty S ... lstate_energy interference weighting source public velocity discharge magnet multiple final
40 2-phenylacetonitrile c8h7n 7946.962728 7947.1695 -5.3480 0.206772 7.765701 0.1409 0.0 ... 6.2021 False 0.0 Catalog True 0.0 False False [penta-2,4-diynal, 7,947.0717] False
385 benzonitrile, v15 c7h5n 18425.935412 18426.1108 -5.6658 0.175388 6.628221 0.1065 0.0 ... 3.9995 False 0.0 Catalog True 0.0 False False [cyanoprop-1,2-dien-1,3-diyl, 18,426.1726, 3-p... False
353 but-3-enenitrile (cis) c4h5n 16331.429075 16331.4637 -5.3417 0.034625 38.524312 0.0772 0.0 ... 10.8664 False 0.0 Catalog True 0.0 False False [cyclohexa-2,4-dien-1-one, 16,331.4332, ethyny... False
126 cyanoacetyl-cycloprop-1-ene-2,2-diyl c6hn 10379.895776 10379.8652 -3.6143 -0.030576 8.086545 0.0344 0.0 ... 1.7845 False 0.0 Catalog True 0.0 False False [] True
179 hexa-1,3,5-triynylbenzene c12h6 11803.740216 11803.8139 -3.8298 0.073684 18.036535 0.0279 0.0 ... 5.7546 False 0.0 Catalog True 0.0 False False [hexa-1,2,3-trien-5-yne, 11,803.7367, (Z)-but-... False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
43 cyclohexa-1,3-dien-5-yne c6h4 8073.195789 8073.1768 -2.7170 -0.018989 14.487425 0.0000 0.0 ... 0.8300 False 0.0 Catalog True 0.0 False False [] True
39 vinyl_triacetylene c8h4 7924.848564 7924.8884 -2.7370 0.039836 7.026208 0.0000 0.0 ... 0.7930 False 0.0 Catalog True 0.0 False False [] True
367 prop-1-yne c3h4 17091.744758 17091.7420 -1.5739 -0.002758 140.717850 0.0000 0.0 ... 0.0000 False 0.0 Catalog True 0.0 False False [] True
368 prop-1-yne,1v9 c3h4 17102.103117 17102.0765 -1.5722 -0.026617 31.469987 0.0000 0.0 ... 0.0000 False 0.0 Catalog True 0.0 False False [benzonitrile, v15, 17,101.9416] True
314 cyclopropa_1_yne_3_yl_radical c3h 14893.034757 14893.0554 -1.6790 0.020643 123.717808 0.0000 0.0 ... 0.6391 False 0.0 Catalog True 0.0 False False [] True

428 rows × 28 columns

When it comes to making plots, we might also be interested in removing the features that have already been assigned from X/Y; the clean_spectral_assignments() function replaces regions of the spectrum that have been assigned with white noise, to make it look natural.

[32]:
session.clean_spectral_assignments()

You can then plot the cleaned spectrum, where all of the assigned features are removed from the spectrum with plot_assigned(). This creates a plotly figure which is interactive!

Note that the plot_assigned() function can be used at any point of notebook too; the latest spectrum with assignments overlaid will be shown.

[33]:
session.plot_assigned()

Conclusions

This notebook completes the first analysis step, which is often the most tedious: assigning and keeping track of every spectral feature, and translating that into something that is publishable. We went through how a spectrum can be loaded and interfaced with the AssignmentSession class in PySpecTools, followed by peak finding. We then created LineList objects based on SPCAT catalogs, and fed them to the AssignmentSession to process, and showed that you could do this en masse. Finally, the results of the analysis are saved to disk, and generating an interactive report.

In a future notebook, we’ll take a look at what kind of things we can do with the saved AssignmentSession, for example chemical composition analysis, and making plots of the data for publication.