Analyzing Broadband Spectra with the assignment
Module¶
Introduction¶
In this notebook, we’re going to work through how the core functionality of PySpecTools
can be used to streamline and automate your spectral analysis. It’s worth noting that PySpecTools
and Python provide enough flexibility for you to adjust to your needs; whatever can’t be done with PySpecTools
natively could be automated with Python (e.g. for
loops) and to a large extent pandas
as well. In the latter case, particularly when you’re analyzing the assignments, and looking to
filter out certain molecules, etc. This may be left for a subsequent notebook as the focus of this notebook is to demonstrate how automated assignment is performed.
The core functionality of assigning spectra revolves around the pyspectools.spectra.assignment
module, and contains three main abstractions:
AssignmentSession
This is your main interface: holds the spectral data, and allows you to interact (plot, assign, etc) with the data.
Transition
Represents every type of spectral feature: every peak in an experiment, and every catalog entry.
LineList
A collection of spectral features: the peaks in an experiment (which in themselves are
Transition
objects), and catalogs.
We will demonstrate how these pieces come together by looking at some of our published data: this notebook was used to analyze the Benzene discharge experiments reported in these two papers:
McCarthy, M. C.; Lee, K. L. K.; Carroll, P. B.; Porterfield, J. P.; Changala, P. B.; Thorpe, J. H.; Stanton, J. F. Exhaustive Product Analysis of Three Benzene Discharges by Microwave Spectroscopy. J. Phys. Chem. A 2020, 124 (25), 5170–5181. https://doi.org/10.1021/acs.jpca.0c02919.
Lee, K. L. K.; McCarthy, M. Study of Benzene Fragmentation, Isomerization, and Growth Using Microwave Spectroscopy. J. Phys. Chem. Lett. 2019, 10 (10), 2408–2413. https://doi.org/10.1021/acs.jpclett.9b00586.
The full dataset can also be found on our Zenodo repository; notebook “4000” most closely resembles this (this is a much more heavily marked up version).
We should stress that, while this is mostly automated, it does not change the fact that spectral analysis is very much an iterative process. You will make modifications to the way you do your analysis, and many things you won’t know until you’ve run it at least once. The point of having this notebook is so that it is reproducible and transparent: you can always modify the code and re-run the whole notebook with the latest analysis.
To begin the analysis, we will construct an AssignmentSession
object using the class method, AssignmentSession.from_ascii(...)
. This method will take your ASCII spectrum containing frequency and intensity information, and parse it using pandas
and store it as a DataFrame
. With all Python routines, you can call the function/method with a question mark at the end to pull up the documentation associated with that function/method:
[2]:
from pyspectools.spectra.assignment import AssignmentSession, LineList
In this case, we’re setting up the session based on the Benzene data, which is a tab-delimited text file with a header. We ignore the header with skiprows=
, and provide our own column names with the col_names
argument. Additionally, we’re going to specify the composition we expect for the experiment with the composition
kwarg: ideally we would only include ["C", "H"]
, however we know there are atmospheric impurities like nitrogen and oxygen that get incorporated in the discharge
products. This keyword will affect Splatalogue assignments, and exclude catalogs that contain irrelevant compositions like metal-bearing molecules.
[2]:
session = AssignmentSession.from_ascii(
"chirp_data/ft2632_hanning_620.txt",
experiment=4000,
col_names=["Frequency", "Intensity"],
skiprows=1,
composition=["C", "H", "N", "O"],
verbose=False
)
You can also adjust many of these settings after the fact, which are stored as attributes of the Session
object within an AssignmentSession
. For example, the temperature
attribute will set an upper limit to the lower state energies of states assignable: we will ignore all features that are double this specified energy. This isn’t the direct threshold, because it nominally corresponds to what your experimental temperature is, and depending on how prominent molecule is, you may see
higher temperature transitions. Another useful thing to set is the maximum tolerance for uncertainty in catalog entries: we would like to reject assignments based on poorly predicted lines, which is set by the max_uncertainty
attribute.
[ ]:
# temperature in K
session.session.temperature = 10.
# uncertainty in MHz
session.session.max_uncertainty = 0.2
Note that frequency units are in MHz, and temperature in kelvin.
The next step is to pre-process the spectrum. Our chirped-pulse data are collected using Kyle Crabtree’s blackchirp
program, and often we apply a window function to the data. If you are looking at raw FFT data, PySpecTools
provides access to window functions defined in scipy.signal
, which you can access in a syntax like this:
session.apply_filter("hanning")
The full list of filters can be found in the SciPy documentation.
After pre-processing, we will perform peak detection and baseline correction. This is done using the session.find_peaks
functionality, which automates several steps based on the keyword arguments. All of the analysis in PySpecTools
is done preferably in units of signal-to-noise ratio (SNR), which is established by fitting a baseline (a vector, not scalar), and dividing the entire spectrum element-wise. SNR is definitely more meaningful than a raw voltage scale typically reported.
In the default way of peak finding, we use the asymmetric least-squares (ALS) method to fit a baseline (als=True
). Essentially this can be thought of as a penalized least-squares method, with additional parameters that define how quickly the baseline can respond (you don’t want to over-subtract signal). These parameters can be accessed by providing find_peaks
with keywords arguments (see
documentation). The sigma
keyword then specifies the minimum SNR value to use for peak finding; note that if als=False
, threshold
and sigma
are equivalent. The former specifies the absolute intensity scale to use for peak finding.
[3]:
# Returns a pandas DataFrame containing frequency/intensity of
# every peak detected. This is also stored as an attribute;
# `AssignmentSession.peaks`
peaks = session.find_peaks(sigma=6, als=True)
[4]:
# Use the `describe` method of a `DataFrame` to summarize the
# peaks information
peaks.describe()
[4]:
Frequency | Peak Frequencies | Intensity | |
---|---|---|---|
count | 447.000000 | 447.000000 | 447.000000 |
mean | 12300.347573 | 12300.342431 | 28.373735 |
std | 3365.905814 | 3365.901388 | 41.156217 |
min | 6385.075006 | 6385.066667 | 6.002450 |
25% | 9542.620545 | 9542.666667 | 8.248862 |
50% | 12215.902539 | 12215.911111 | 14.821400 |
75% | 14753.087387 | 14753.155555 | 32.619844 |
max | 19845.175891 | 19845.155556 | 499.347482 |
In the cell below, we actually manually add some lines. Automated peak detection can never be perfect, especially with blended features. You can add frequency/intensity information by providing a list of 2-tuples as an argument to the add_ulines
method:
[6]:
session.add_ulines(
[
(7483.911, 9.390),
(8773.866, 12.523),
(9200.000, 9.116),
(9200.888, 9.442),
(10258.311, 6.850),
(10259.111, 6.948),
(10262.044, 15.061),
(10843.111, 9.215),
(10928.266, 12.748),
(10959.38, 14.302),
(10978.93, 8.527),
(10979.73, 7.273),
(11454.844, 7.216),
(11547.555, 7.485),
(11548.000, 8.370),
(11550.49, 7.134),
(11561.51, 7.720),
(11940.00, 6.039),
(12476.444, 14.628),
(12475.911, 13.628),
(13558.40, 7.472),
(13609.07, 6.087),
(13751.378, 6.745),
(13792.80, 9.937),
(14839.64, 6.485),
(14919.555, 17.971),
(15248.177, 13.216),
(15249.067, 15.414),
(15557.60, 6.572),
(16581.07, 7.550),
(16706.76, 70.758),
(16707.47, 49.851),
(16710.67, 70.43661),
(16711.47, 48.40109),
(17115.02, 9.315)
]
)
Running assignments¶
With all the peaks found, we can start doing some assignments of the features! The main way this is done is by creating LineList
objects, which are then fed to the session.process_linelist
method as we shall see later.
There are different types of LineList
objects, depending on the source of data:
from_artifacts
from_clock
from_catalog
from_pgopher
from_dataframe
from_lin
from_splatalogue_query
from_list
from_artifacts
will create a specialized LineList
that flags Transitions
as non-molecular for book-keeping. from_clock
is a special variant of this, where we have found that radio interference arising from arbitrary waveform generators often bleed into the resulting chirped-pulse spectrum, and exhaustively generates combinations/harmonics of the clock frequency as artifacts.
[7]:
artifacts = LineList.from_artifacts(
[8000., 16000., 8125.,16250., 7065.7778, 7147.3778, 8574.9022]
)
With the artifacts
variable/object, you can then pass it to the process_linelist
method of our AssignmentSession
, and it will automatically cross-correlate every unassigned (U-line) with entries contained in your LineList
:
[8]:
session.process_linelist(linelist=artifacts)
For molecular assignments, you could of course repeat this process and manually create individual LineList
s; in this example, we’ll take an SPCAT catalog and generate the LineList
:
formaldehyde = LineList.from_catalog(name="formaldehyde", formula="H2CO", "catalogs/h2co.cat")
However, this is incredibly time consuming, and not pretty to look at (not to mention a nightmare to update). Instead, we recommend you set up a directory containing all of your catalogs, and create an input file that stores all of the metadata for the catalogs and “batch” process all of the catalogs. In the cell below, we automated the analysis of hydrocarbon molecules (separated oxygen- and nitrogen-bearing species) with a YAML file called hydrocarbons_cat.yml
. YAML is a simple markup
syntax that is both machine and human read/writeable. Below is a small excerpt of our file:
ethynylbenzene,v23:
formula: c8h6
filepath: h_catalogs/phenylacetylene_v23.cat
ethynylbenzene,2v23:
formula: c8h6
filepath: h_catalogs/phenylacetylene_2v23.cat
ethynylbenzene,v16:
formula: c8h6
filepath: h_catalogs/phenylacetylene_v16.cat
buta-1,3-diynylbenzene:
formula: c10h6
filepath: h_catalogs/phenyldiacetylene.cat
hexa-1,3,5-triynylbenzene:
formula: c12h6
filepath: h_catalogs/phenyltriacetylene.cat
You can actually provide the source
keyword as well, and include a BibTeX citekey. When it comes to automatic report generation, the citation will be automatically used to streamline LaTeX table generation.
molecule_name:
formula: C12H6 # formula
source: mccarthy_benzene_2020 # citekey
filepath: catalog/molecule.cat # filepath to the SPCAT catalog
[9]:
session.process_linelist_batch(yml_path="hydrocarbons_cat.yml")
Line list for: cyclopropa-1,2-diene,gs Formula: c3h2, Number of entries: 80
Line list for: ethynylbenzene Formula: c8h6, Number of entries: 144
Line list for: umol-1850 Formula: cxhy, Number of entries: 150
Line list for: ethynylbenzene,v23 Formula: c8h6, Number of entries: 374
Line list for: ethynylbenzene,2v23 Formula: c8h6, Number of entries: 374
Line list for: ethynylbenzene,v16 Formula: c8h6, Number of entries: 374
Line list for: buta-1,3-diynylbenzene Formula: c10h6, Number of entries: 745
Line list for: hexa-1,3,5-triynylbenzene Formula: c12h6, Number of entries: 229
Line list for: 5-ethylenecyclopenta-1,3-diene Formula: c6h6, Number of entries: 82
Line list for: 1-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 355
Line list for: 2-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 462
Line list for: cyclohexa-1,3-dien-5-yne Formula: c6h4, Number of entries: 187
Line list for: cyclohexa-1,3-dien-5-yne,2v16 Formula: c6h4, Number of entries: 77
Line list for: cyclohexa-1,3-dien-5-yne,v16 Formula: c6h4, Number of entries: 77
Line list for: cyclohexa-1,3-dien-5-yne,v15 Formula: c6h4, Number of entries: 77
Line list for: prop-1-yne Formula: c3h4, Number of entries: 4
Line list for: prop-1-yne,1v9 Formula: c3h4, Number of entries: 4
Line list for: prop-1-yne,1v10 Formula: c3h4, Number of entries: 4
Line list for: prop-1-yne,2v10 Formula: c3h4, Number of entries: 4
Line list for: penta-1,3-diyne Formula: c5h4, Number of entries: 28
Line list for: penta-1,3-diyne,1v11 Formula: c5h4, Number of entries: 16
Line list for: penta-1,3-diyne,1v12 Formula: c5h4, Number of entries: 16
Line list for: penta-1,3-diyne,1v13 Formula: c5h4, Number of entries: 16
Line list for: penta-1,3-diyne,ve1 Formula: c5h4, Number of entries: 28
Line list for: hepta-1,3,5-triyne Formula: c7h4, Number of entries: 79
Line list for: (2Z)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 308
Line list for: (2E)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 155
Line list for: but-1-en-3-yne Formula: c4h4, Number of entries: 23
Line list for: hex-1-ene-3,5-diyne Formula: c6h4, Number of entries: 97
Line list for: vinyl_triacetylene Formula: c8h4, Number of entries: 93
Line list for: 5-ethenylidenecyclopenta-1,3-diene Formula: c7h6, Number of entries: 226
Line list for: 5-ethenylidenecyclopenta-1,3-diene,v22 Formula: c7h6, Number of entries: 202
Line list for: cyclopenta-1,3-diene Formula: c5h6, Number of entries: 88
Line list for: (Z)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 172
Line list for: penta-1,2-dien-4-yne Formula: c5h4, Number of entries: 71
Line list for: hepta-1,2,3,4,5-pentaene-6-yne Formula: c7h4, Number of entries: 209
Line list for: cis-hex-ene-diyene Formula: c6h4, Number of entries: 273
Line list for: hexa-1,2,3-trien-5-yne Formula: c6h4, Number of entries: 232
Line list for: hepta-1,2-dien-4,6-diyne Formula: c7h4, Number of entries: 271
Line list for: cyclopropa_1_yne_3_yl_radical Formula: c3h, Number of entries: 1119
Line list for: cyclopropa-1-yne-3-yl_radical,ve1 Formula: c3h, Number of entries: 621
Line list for: cyclopropa-1-yne-3-yl_radical,ve2 Formula: c3h, Number of entries: 299
Line list for: cyclopropa-1-yne-3-yl_radical,ve3 Formula: c3h, Number of entries: 301
Line list for: buta-1,3-diynyl radical Formula: c4h, Number of entries: 102
Line list for: 1,2,3,4-pentatetraene-1,1,5-trienyl radical Formula: c5h, Number of entries: 93
Line list for: hexa-1,3,5-triynyl radical Formula: c6h, Number of entries: 176
Line list for: 1,2,3,4,5,6-heptahexaene-1,1,7-trienyl radical Formula: c7h, Number of entries: 154
Line list for: propadienylidene Formula: c3h2, Number of entries: 10
Line list for: butatrienylidene Formula: c4h2, Number of entries: 14
Line list for: pentatetraenylidene Formula: c5h2, Number of entries: 28
Line list for: 1-ethynyl-cycloprop-1-en-2-ylidene Formula: c5h2, Number of entries: 70
Line list for: penta-1,2-dien-4-yne-1-ylidene Formula: c5h2, Number of entries: 92
Line list for: cylcohexadiene Formula: c6h8, Number of entries: 69
Line list for: (4Z)-hepta-1,2,4-trien-6-yne (anti) Formula: c7h6, Number of entries: 492
Line list for: (E)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 87
We repreat the same procedure for a .lin
file, which also follows SPFIT formatting. The from_XXX
parser is chosen based on the extension of the referenced file.
[10]:
session.process_linelist_batch(yml_path="hydrocarbons_lin.yml")
Line list for: cyclopropa-1,2-diene,gs Formula: c3h2, Number of entries: 406
Line list for: umol-1850 Formula: cxhy, Number of entries: 24
Line list for: cyclopropa-1,2-diene (HC13CCH) Formula: c3h2, Number of entries: 6
Line list for: cyclopropa-1,2-diene (H13CCCH) Formula: c3h2, Number of entries: 12
Line list for: cyclopropa-1,2-diene,1v2 Formula: c3h2, Number of entries: 37
Line list for: cyclopropa-1,2-diene,1v3 Formula: c3h2, Number of entries: 35
Line list for: cyclopropa-1,2-diene,1v5 Formula: c3h2, Number of entries: 28
Line list for: cyclopropa-1,2-diene,1v6 Formula: c3h2, Number of entries: 38
Line list for: cyclopropa-1,2-diene,2v6 Formula: c3h2, Number of entries: 17
Line list for: cyclopropa-1,2-diene,3v6 Formula: c3h2, Number of entries: 5
Line list for: cyclopropa-1,2-diene,4v6 Formula: c3h2, Number of entries: 2
Line list for: cyclopropa-1,2-diene,1v5+1v6 Formula: c3h2, Number of entries: 2
Line list for: cyclopropa-1-yne-3-yl_radical,ve1 Formula: c3h, Number of entries: 22
Line list for: cyclopropa-1-yne-3-yl_radical,ve2 Formula: c3h, Number of entries: 5
Line list for: cyclopropa-1-yne-3-yl_radical,ve3 Formula: c3h, Number of entries: 11
Line list for: penta-1,3-diyne,ve2 Formula: c5h4, Number of entries: 9
Line list for: penta-1,3-diyne,ve3 Formula: c5h4, Number of entries: 9
Line list for: ethynylbenzene Formula: c8h6, Number of entries: 58
Line list for: ethynylbenzene,v23 Formula: c8h6, Number of entries: 35
Line list for: ethynylbenzene,2v23 Formula: c8h6, Number of entries: 11
Line list for: ethynylbenzene,v16 Formula: c8h6, Number of entries: 16
Line list for: buta_1,3_diynylbenzene Formula: c10h6, Number of entries: 86
Line list for: hexa_1,3,5_triynylbenzene Formula: c12h6, Number of entries: 25
Line list for: 5-ethylenecyclopenta-1,3-diene Formula: c6h6, Number of entries: 28
Line list for: 1-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 30
Line list for: 2-ethynylcyclopenta-1,3-diene Formula: c7h6, Number of entries: 39
Line list for: hepta-1,3,5-triyne Formula: c7h4, Number of entries: 16
Line list for: (2Z)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 29
Line list for: (2E)-hexa-1,3-dien-5-yne (anti) Formula: c6h6, Number of entries: 32
Line list for: hex-1-ene-3,5-diyne Formula: c6h4, Number of entries: 22
Line list for: 5-ethenylidenecyclopenta-1,3-diene Formula: c7h6, Number of entries: 26
Line list for: 5-ethenylidenecyclopenta-1,3-diene,v22 Formula: c7h6, Number of entries: 18
Line list for: cyclopenta-1,3-diene Formula: c5h6, Number of entries: 19
Line list for: penta-1,2-dien-4-yne Formula: c5h4, Number of entries: 14
Line list for: hepta-1,2,3,4,5-pentaene-6-yne Formula: c7h4, Number of entries: 16
Line list for: hexa-1,2,3-trien-5-yne Formula: c6h4, Number of entries: 23
Line list for: hepta-1,2-dien-4,6-diyne Formula: c7h4, Number of entries: 45
Line list for: 1-ethynyl-cycloprop-1-en-2-ylidene Formula: c5h2, Number of entries: 13
Line list for: penta-1,2-dien-4-yne-1-ylidene Formula: c5h2, Number of entries: 13
Line list for: (4Z)-hepta-1,2,4-trien-6-yne (anti) Formula: c7h6, Number of entries: 34
Line list for: l_ccch,ve Formula: c3h, Number of entries: 32
Line list for: prop-1-yne,ve1 Formula: c3h4, Number of entries: 9
Line list for: prop-1-yne,ve2 Formula: c3h4, Number of entries: 9
Line list for: (E)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 20
Line list for: (E)-3-penten-1-yne, E state Formula: c5h6, Number of entries: 18
Line list for: (Z)-3-penten-1-yne, A state Formula: c5h6, Number of entries: 13
Line list for: (Z)-3-penten-1-yne, E state Formula: c5h6, Number of entries: 9
Finishing the analysis¶
This basically completes the assignment process! We just have a few more steps to take to save the analysis; a Pickle
file is saved to disk, which is then used for all the subsequent analysis (e.g. line profile, statistics). The session.finalize_assignments()
is currently not as final as it sounds: it just prompts all the report and table generation to happen, as well as export all of the identified and unidentified data into respective folders.
[15]:
session.finalize_assignments()
The save_session
function below then dumps the entire analysis into the folder sessions/{experiment_ID}.pkl
, where {experiment_ID}
is the number assigned to the experiment all the way at the beginning (experiment=4000
).
[25]:
session.save_session()
You can then load this session back in in a separate notebook with AssignmentSession.load_session("sessions/{experiment_ID}.pkl")
[26]:
session = AssignmentSession.load_session("sessions/4000.pkl")
This loads in all of the information from before, including the results generated with finalize_assignments()
. For example, the identifications
attribute stores a dict
which tracks each distinct species as keys, with the number of assigned lines as values:
[27]:
session.identifications
[27]:
{'buta-1,3-diynylbenzene': 51,
'umol-1999': 1,
'2-ethynylcyclopenta-1,3-diene': 17,
'5-ethenylidenecyclopenta-1,3-diene': 16,
'1-ethynyl-cycloprop-1-en-2-ylidene': 6,
'1-ethynylcyclopenta-1,3-diene': 18,
'5-ethenylidenecyclopenta-1,3-diene,v22': 8,
'(2Z)-hexa-1,3-dien-5-yne (anti)': 10,
'umol-1850': 5,
'1,2,3,4-pentatetraene-1,1,5-trienyl radical': 9,
'hepta-1,2,3,4,5-pentaene-6-yne': 10,
'hepta-1,2-dien-4,6-diyne': 13,
'hexa-1,3,5-triynylbenzene': 3,
'ethynylbenzene': 19,
'ethynylbenzene,v23': 18,
'benzonitrile, v21': 7,
'hepta-1,3,5-triyne': 5,
'1,2,3,4,5,6-heptahexaene-1,1,7-trienyl radical': 5,
'2-phenylacetonitrile': 2,
'vinyl_triacetylene': 1,
'hex-1-ene-3,5-diyne': 10,
'cyclohexa-1,3-dien-5-yne': 4,
'cyclohexa-2,4-dien-1-one': 9,
'Artifact': 2,
'ethynylbenzene,v16': 6,
'ethynylbenzene,2v23': 5,
'penta-1,3-diyne': 3,
'penta-1,3-diyne,1v12': 6,
'penta-1,3-diyne,ve2': 3,
'(2E)-hexa-1,3-dien-5-yne (anti)': 13,
'(4Z)-hepta-1,2,4-trien-6-yne (anti)': 9,
'(E)-pent-2-en-4-ynenitrile': 1,
'benzonitrile': 7,
'hexa-1,2,3-trien-5-yne': 3,
'butatrienylidene': 4,
'but-1-en-3-yne': 2,
'prop-2-ynenitrile': 2,
'hexa-4,5-dien-2-ynenitrile': 2,
'pentatetraenylidene': 6,
'prop-2-ynal': 2,
'prop-2-enenitrile': 2,
'buta-1,3-diynyl radical': 3,
'3-phenylprop-2-ynenitrile': 1,
'3-oxo-1,2-propadienylidene': 2,
'cyclohexa-2,5-dien-1-one': 11,
'penta-1,2-dien-4-yne': 6,
'cyclohexa-1,3-dien-5-yne,2v16': 2,
'cyanoprop-1,2-dien-1,3-diyl': 1,
'cyanoacetyl-cycloprop-1-ene-2,2-diyl': 1,
'penta-2,4-diynal': 3,
'penta-2,4-diynenitrile': 3,
'5-ethylenecyclopenta-1,3-diene': 3,
'cyclopenta-2,4-dien-1-one': 2,
'penta-1,3-diyne,1v11': 2,
'penta-1,3-diyne,ve1': 1,
'penta-1,3-diyne,ve3': 1,
'penta-1,3-diyne,1v13': 1,
'(Z)-3-penten-1-yne, A state': 1,
'(Z)-3-penten-1-yne, E state': 1,
'cyclopenta-2,4-dien-1-one, ve1': 1,
'cis-hex-ene-diyene': 1,
'cylcohexadiene': 1,
'(E)-3-penten-1-yne, A state': 1,
'buta-2,3-dien-1-imine (syn)': 1,
'cyclopenta-1,3-diene-1-carbonitrile': 1,
'cyclopenta-2,4-diene-1-carbonitrile': 1,
'benzonitrile, v12': 1,
'cyclopropa-1-yne-3-yl_radical,ve2': 2,
'cyclopropa_1_yne_3_yl_radical': 12,
'cyclopropa-1-yne-3-yl_radical,ve1': 4,
'cyclohexa-1,3-dien-5-yne,v15': 1,
'cyclohexa-1,3-dien-5-yne,v16': 1,
'cyclopropa-1-yne-3-yl_radical,ve3': 2,
'buta-2,3-dienenitrile': 1,
'cyclopropa-1,2-diene,4v6': 1,
'cyclopropa-1,2-diene,3v6': 1,
'but-3-enenitrile (cis)': 1,
'penta-1,2-dien-4-yne-1-ylidene': 2,
'cyclopropa-1,2-diene,2v6': 1,
'cyclopropa-1,2-diene (HC13CCH)': 1,
'prop-1-yne,ve1': 1,
'(E)-but-2-enal (anti)': 1,
'prop-1-yne': 1,
'prop-1-yne,1v9': 1,
'cyclopropa-1,2-diene,1v5+1v6': 1,
'cyclopropa-1,2-diene,1v6': 1,
'penta-1,2-dien-1-one-3-yl radical': 1,
'cyclopropa-1,2-diene,1v3': 1,
'cyclopropa-1,2-diene,1v2': 1,
'cyclopropa-1,2-diene,gs': 1,
'cyclopropa-1,2-diene (H13CCCH)': 1,
'benzonitrile, v15': 1,
'(2E)-2,4-pentadienal (syn)': 1,
'cyanopenta-2,4-diyne-2,2-diyl': 1,
'c3s': 1,
'hexa-1,3,5-triynyl radical': 4,
'prop-1-yne,ve2': 1}
You can also view all of the assignment information by accessing the DataFrame
stored as the table
attribute. Below, we also demonstrate how we can sort columns based on their values, for example looking at the transitions with the highest catalog uncertainty first.
[28]:
session.table.sort_values(["uncertainty"], ascending=False)
[28]:
name | smiles | formula | frequency | catalog_frequency | catalog_intensity | deviation | intensity | uncertainty | S | ... | lstate_energy | interference | weighting | source | public | velocity | discharge | magnet | multiple | final | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
40 | 2-phenylacetonitrile | c8h7n | 7946.962728 | 7947.1695 | -5.3480 | 0.206772 | 7.765701 | 0.1409 | 0.0 | ... | 6.2021 | False | 0.0 | Catalog | True | 0.0 | False | False | [penta-2,4-diynal, 7,947.0717] | False | |
385 | benzonitrile, v15 | c7h5n | 18425.935412 | 18426.1108 | -5.6658 | 0.175388 | 6.628221 | 0.1065 | 0.0 | ... | 3.9995 | False | 0.0 | Catalog | True | 0.0 | False | False | [cyanoprop-1,2-dien-1,3-diyl, 18,426.1726, 3-p... | False | |
353 | but-3-enenitrile (cis) | c4h5n | 16331.429075 | 16331.4637 | -5.3417 | 0.034625 | 38.524312 | 0.0772 | 0.0 | ... | 10.8664 | False | 0.0 | Catalog | True | 0.0 | False | False | [cyclohexa-2,4-dien-1-one, 16,331.4332, ethyny... | False | |
126 | cyanoacetyl-cycloprop-1-ene-2,2-diyl | c6hn | 10379.895776 | 10379.8652 | -3.6143 | -0.030576 | 8.086545 | 0.0344 | 0.0 | ... | 1.7845 | False | 0.0 | Catalog | True | 0.0 | False | False | [] | True | |
179 | hexa-1,3,5-triynylbenzene | c12h6 | 11803.740216 | 11803.8139 | -3.8298 | 0.073684 | 18.036535 | 0.0279 | 0.0 | ... | 5.7546 | False | 0.0 | Catalog | True | 0.0 | False | False | [hexa-1,2,3-trien-5-yne, 11,803.7367, (Z)-but-... | False | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
43 | cyclohexa-1,3-dien-5-yne | c6h4 | 8073.195789 | 8073.1768 | -2.7170 | -0.018989 | 14.487425 | 0.0000 | 0.0 | ... | 0.8300 | False | 0.0 | Catalog | True | 0.0 | False | False | [] | True | |
39 | vinyl_triacetylene | c8h4 | 7924.848564 | 7924.8884 | -2.7370 | 0.039836 | 7.026208 | 0.0000 | 0.0 | ... | 0.7930 | False | 0.0 | Catalog | True | 0.0 | False | False | [] | True | |
367 | prop-1-yne | c3h4 | 17091.744758 | 17091.7420 | -1.5739 | -0.002758 | 140.717850 | 0.0000 | 0.0 | ... | 0.0000 | False | 0.0 | Catalog | True | 0.0 | False | False | [] | True | |
368 | prop-1-yne,1v9 | c3h4 | 17102.103117 | 17102.0765 | -1.5722 | -0.026617 | 31.469987 | 0.0000 | 0.0 | ... | 0.0000 | False | 0.0 | Catalog | True | 0.0 | False | False | [benzonitrile, v15, 17,101.9416] | True | |
314 | cyclopropa_1_yne_3_yl_radical | c3h | 14893.034757 | 14893.0554 | -1.6790 | 0.020643 | 123.717808 | 0.0000 | 0.0 | ... | 0.6391 | False | 0.0 | Catalog | True | 0.0 | False | False | [] | True |
428 rows × 28 columns
When it comes to making plots, we might also be interested in removing the features that have already been assigned from X/Y; the clean_spectral_assignments()
function replaces regions of the spectrum that have been assigned with white noise, to make it look natural.
[32]:
session.clean_spectral_assignments()
You can then plot the cleaned spectrum, where all of the assigned features are removed from the spectrum with plot_assigned()
. This creates a plotly
figure which is interactive!
Note that the plot_assigned()
function can be used at any point of notebook too; the latest spectrum with assignments overlaid will be shown.
[33]:
session.plot_assigned()
Conclusions¶
This notebook completes the first analysis step, which is often the most tedious: assigning and keeping track of every spectral feature, and translating that into something that is publishable. We went through how a spectrum can be loaded and interfaced with the AssignmentSession
class in PySpecTools
, followed by peak finding. We then created LineList
objects based on SPCAT catalogs, and fed them to the AssignmentSession
to process, and showed that you could do this en masse.
Finally, the results of the analysis are saved to disk, and generating an interactive report.
In a future notebook, we’ll take a look at what kind of things we can do with the saved AssignmentSession
, for example chemical composition analysis, and making plots of the data for publication.