{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyzing Broadband Spectra with the `assignment` Module\n", "\n", "## Introduction\n", "\n", "In this notebook, we're going to work through how the core functionality of `PySpecTools` can be used to streamline and automate your spectral analysis. It's worth noting that `PySpecTools` and Python provide enough flexibility for you to adjust to your needs; whatever can't be done with `PySpecTools` natively could be automated with Python (e.g. `for` loops) and to a large extent `pandas` as well. In the latter case, particularly when you're analyzing the assignments, and looking to filter out certain molecules, etc. This may be left for a subsequent notebook as the focus of this notebook is to demonstrate how automated assignment is performed.\n", "\n", "The core functionality of assigning spectra revolves around the `pyspectools.spectra.assignment` module, and contains three main abstractions:\n", "\n", "1. `AssignmentSession`\n", " - This is your main interface: holds the spectral data, and allows you to interact (plot, assign, etc) with the data.\n", "2. `Transition`\n", " - Represents every type of spectral feature: every peak in an experiment, and every catalog entry.\n", "3. `LineList`\n", " - A collection of spectral features: the peaks in an experiment (which in themselves are `Transition` objects), and catalogs.\n", "\n", "We will demonstrate how these pieces come together by looking at some of our published data: this notebook was used to analyze the Benzene discharge experiments reported in these two papers: \n", "\n", "McCarthy, M. C.; Lee, K. L. K.; Carroll, P. B.; Porterfield, J. P.; Changala, P. B.; Thorpe, J. H.; Stanton, J. F. Exhaustive Product Analysis of Three Benzene Discharges by Microwave Spectroscopy. J. Phys. Chem. A 2020, 124 (25), 5170–5181. https://doi.org/10.1021/acs.jpca.0c02919.\n", "\n", "Lee, K. L. K.; McCarthy, M. Study of Benzene Fragmentation, Isomerization, and Growth Using Microwave Spectroscopy. J. Phys. Chem. Lett. 2019, 10 (10), 2408–2413. https://doi.org/10.1021/acs.jpclett.9b00586.\n", "\n", "The full dataset can also be found on our [Zenodo repository](https://zenodo.org/record/3827742); notebook \"4000\" most closely resembles this (this is a much more heavily marked up version).\n", "\n", "We should stress that, while this is mostly automated, it does not change the fact that spectral analysis is _very much an iterative process_. You will make modifications to the way you do your analysis, and many things you won't know until you've run it at least once. The point of having this notebook is so that it is reproducible and transparent: you can always modify the code and re-run the whole notebook with the latest analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To begin the analysis, we will construct an `AssignmentSession` object using the class method, `AssignmentSession.from_ascii(...)`. This method will take your ASCII spectrum containing frequency and intensity information, and parse it using `pandas` and store it as a `DataFrame`. With all Python routines, you can call the function/method with a question mark at the end to pull up the documentation associated with that function/method:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from pyspectools.spectra.assignment import AssignmentSession, LineList" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, we're setting up the session based on the Benzene data, which is a tab-delimited text file with a header. We ignore the header with `skiprows=`, and provide our own column names with the `col_names` argument. Additionally, we're going to specify the composition we expect for the experiment with the `composition` kwarg: ideally we would only include `[\"C\", \"H\"]`, however we know there are atmospheric impurities like nitrogen and oxygen that get incorporated in the discharge products. This keyword _will affect Splatalogue assignments_, and exclude catalogs that contain irrelevant compositions like metal-bearing molecules." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "session = AssignmentSession.from_ascii(\n", " \"chirp_data/ft2632_hanning_620.txt\",\n", " experiment=4000,\n", " col_names=[\"Frequency\", \"Intensity\"],\n", " skiprows=1,\n", " composition=[\"C\", \"H\", \"N\", \"O\"],\n", " verbose=False\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also adjust many of these settings after the fact, which are stored as attributes of the `Session` object within an `AssignmentSession`. For example, the `temperature` attribute will set an upper limit to the lower state energies of states assignable: we will ignore all features that are double this specified energy. This isn't the direct threshold, because it nominally corresponds to what your experimental temperature is, and depending on how prominent molecule is, you may see higher temperature transitions. Another useful thing to set is the maximum tolerance for uncertainty in catalog entries: we would like to reject assignments based on poorly predicted lines, which is set by the `max_uncertainty` attribute." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# temperature in K\n", "session.session.temperature = 10.\n", "\n", "# uncertainty in MHz\n", "session.session.max_uncertainty = 0.2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that frequency units are in MHz, and temperature in kelvin.\n", "\n", "The next step is to pre-process the spectrum. Our chirped-pulse data are collected using Kyle Crabtree's `blackchirp` program, and often we apply a window function to the data. If you are looking at raw FFT data, `PySpecTools` provides access to window functions defined in `scipy.signal`, which you can access in a syntax like this:\n", "\n", "```python\n", "session.apply_filter(\"hanning\")\n", "```\n", "\n", "The full list of filters can be found [in the SciPy documentation](https://docs.scipy.org/doc/scipy/reference/signal.windows.html#module-scipy.signal.windows).\n", "\n", "After pre-processing, we will perform peak detection and baseline correction. This is done using the `session.find_peaks` functionality, which automates several steps based on the keyword arguments. All of the analysis in `PySpecTools` is done preferably in units of signal-to-noise ratio (SNR), which is established by fitting a baseline (a _vector_, not scalar), and dividing the entire spectrum element-wise. SNR is definitely more meaningful than a raw voltage scale typically reported.\n", "\n", "In the default way of peak finding, we use the asymmetric least-squares (ALS) method to fit a baseline (`als=True`). Essentially this can be thought of as a penalized least-squares method, with additional parameters that define how quickly the baseline can respond (you don't want to over-subtract signal). These parameters can be accessed by providing `find_peaks` with keywords arguments ([see documentation](https://laserkelvin.github.io/PySpecTools/pyspectools.spectra.html#pyspectools.spectra.assignment.AssignmentSession.find_peaks)). The `sigma` keyword then specifies the minimum SNR value to use for peak finding; note that if `als=False`, `threshold` and `sigma` are equivalent. The former specifies the absolute intensity scale to use for peak finding." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Returns a pandas DataFrame containing frequency/intensity of\n", "# every peak detected. This is also stored as an attribute;\n", "# `AssignmentSession.peaks`\n", "peaks = session.find_peaks(sigma=6, als=True)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Frequency | \n", "Peak Frequencies | \n", "Intensity | \n", "
---|---|---|---|
count | \n", "447.000000 | \n", "447.000000 | \n", "447.000000 | \n", "
mean | \n", "12300.347573 | \n", "12300.342431 | \n", "28.373735 | \n", "
std | \n", "3365.905814 | \n", "3365.901388 | \n", "41.156217 | \n", "
min | \n", "6385.075006 | \n", "6385.066667 | \n", "6.002450 | \n", "
25% | \n", "9542.620545 | \n", "9542.666667 | \n", "8.248862 | \n", "
50% | \n", "12215.902539 | \n", "12215.911111 | \n", "14.821400 | \n", "
75% | \n", "14753.087387 | \n", "14753.155555 | \n", "32.619844 | \n", "
max | \n", "19845.175891 | \n", "19845.155556 | \n", "499.347482 | \n", "
\n", " | name | \n", "smiles | \n", "formula | \n", "frequency | \n", "catalog_frequency | \n", "catalog_intensity | \n", "deviation | \n", "intensity | \n", "uncertainty | \n", "S | \n", "... | \n", "lstate_energy | \n", "interference | \n", "weighting | \n", "source | \n", "public | \n", "velocity | \n", "discharge | \n", "magnet | \n", "multiple | \n", "final | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
40 | \n", "2-phenylacetonitrile | \n", "\n", " | c8h7n | \n", "7946.962728 | \n", "7947.1695 | \n", "-5.3480 | \n", "0.206772 | \n", "7.765701 | \n", "0.1409 | \n", "0.0 | \n", "... | \n", "6.2021 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[penta-2,4-diynal, 7,947.0717] | \n", "False | \n", "
385 | \n", "benzonitrile, v15 | \n", "\n", " | c7h5n | \n", "18425.935412 | \n", "18426.1108 | \n", "-5.6658 | \n", "0.175388 | \n", "6.628221 | \n", "0.1065 | \n", "0.0 | \n", "... | \n", "3.9995 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[cyanoprop-1,2-dien-1,3-diyl, 18,426.1726, 3-p... | \n", "False | \n", "
353 | \n", "but-3-enenitrile (cis) | \n", "\n", " | c4h5n | \n", "16331.429075 | \n", "16331.4637 | \n", "-5.3417 | \n", "0.034625 | \n", "38.524312 | \n", "0.0772 | \n", "0.0 | \n", "... | \n", "10.8664 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[cyclohexa-2,4-dien-1-one, 16,331.4332, ethyny... | \n", "False | \n", "
126 | \n", "cyanoacetyl-cycloprop-1-ene-2,2-diyl | \n", "\n", " | c6hn | \n", "10379.895776 | \n", "10379.8652 | \n", "-3.6143 | \n", "-0.030576 | \n", "8.086545 | \n", "0.0344 | \n", "0.0 | \n", "... | \n", "1.7845 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[] | \n", "True | \n", "
179 | \n", "hexa-1,3,5-triynylbenzene | \n", "\n", " | c12h6 | \n", "11803.740216 | \n", "11803.8139 | \n", "-3.8298 | \n", "0.073684 | \n", "18.036535 | \n", "0.0279 | \n", "0.0 | \n", "... | \n", "5.7546 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[hexa-1,2,3-trien-5-yne, 11,803.7367, (Z)-but-... | \n", "False | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
43 | \n", "cyclohexa-1,3-dien-5-yne | \n", "\n", " | c6h4 | \n", "8073.195789 | \n", "8073.1768 | \n", "-2.7170 | \n", "-0.018989 | \n", "14.487425 | \n", "0.0000 | \n", "0.0 | \n", "... | \n", "0.8300 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[] | \n", "True | \n", "
39 | \n", "vinyl_triacetylene | \n", "\n", " | c8h4 | \n", "7924.848564 | \n", "7924.8884 | \n", "-2.7370 | \n", "0.039836 | \n", "7.026208 | \n", "0.0000 | \n", "0.0 | \n", "... | \n", "0.7930 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[] | \n", "True | \n", "
367 | \n", "prop-1-yne | \n", "\n", " | c3h4 | \n", "17091.744758 | \n", "17091.7420 | \n", "-1.5739 | \n", "-0.002758 | \n", "140.717850 | \n", "0.0000 | \n", "0.0 | \n", "... | \n", "0.0000 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[] | \n", "True | \n", "
368 | \n", "prop-1-yne,1v9 | \n", "\n", " | c3h4 | \n", "17102.103117 | \n", "17102.0765 | \n", "-1.5722 | \n", "-0.026617 | \n", "31.469987 | \n", "0.0000 | \n", "0.0 | \n", "... | \n", "0.0000 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[benzonitrile, v15, 17,101.9416] | \n", "True | \n", "
314 | \n", "cyclopropa_1_yne_3_yl_radical | \n", "\n", " | c3h | \n", "14893.034757 | \n", "14893.0554 | \n", "-1.6790 | \n", "0.020643 | \n", "123.717808 | \n", "0.0000 | \n", "0.0 | \n", "... | \n", "0.6391 | \n", "False | \n", "0.0 | \n", "Catalog | \n", "True | \n", "0.0 | \n", "False | \n", "False | \n", "[] | \n", "True | \n", "
428 rows × 28 columns
\n", "