$SO(3)$ rotation group
$G_{36}^{\dagger}$ nuclear permutation group
$SO(3)$ equivariance simplifies physics and compute
e3nnSame framework as angular momentum coupling: equivariance preserving transformations are angular momentum conserving!
$$\psi_{lm} = \sum_{m_1m_2} C(l_1l_2l; m_1 m_2 m) \psi_{l_1m_1} \psi_{l_2m_2}$$
Clebsch-Gordon coefficients $C$ are non-zero only for specific combinations of angular momenta $l, l_1, l_2, m, m_1, m_2$
$l=0,1,2$ have physical interpretations of scalar, vector, and tensor quantities
...but what do equivariant models actually learn?
How do higher angular momentum features affect modeling?
Symbolic reductions using sympy
WYSIWYG open-source kernel implementations up to $l=10$ with TritonLang
Portable performance across multiple hardware architectures
Three interaction blocks yield node embeddings with $2l_\mathrm{max} + 1$ hidden features
| Configuration | Hidden dimension | # of parameters (M) | Test error (eV) ↓ |
|---|---|---|---|
| $[0, 1, 2]$ | 32 | 1.6 | 0.12 |
| $[0, 1, 2, 4]$ | 32 | 3.0 | 0.15 |
| $[0, 1, 2, 6]$ | 32 | 2.6 | 0.21 |
| $[0, 1, 2, 8]$ | 32 | 2.6 | 0.19 |
| $[0, 1, 2, 10]$ | 32 | 2.6 | 0.19 |
Adding higher $l$ alone does not improve test performance
Points colored by molecular complexity (spacial score)
$l=1-3$ latents are largely unstructured and unused—test error 0.73 eV
Ablating orders improves structure and test performance—test error 0.52 eV
| Configuration | Hidden dimension | Epochs | # of parameters (M) | Test error (eV) ↓ |
|---|---|---|---|---|
| $[0, 1, 2, 3, 4]$ | 16 | 30 | 0.8 | 1.24 |
| $[0, 1, 2, 3, 4, 5, 6]$ | 16 | 30 | 1.9 | 0.73 |
| $[0, 1, 2, 3, 4, 5, 6, 7, 8]$ | 16 | 30 | 3.7 | 1.02 |
| $[0, 3, 4, 5, 6]$ | 16 | 30 | 0.8 | 0.52 |
| $[0, 1, 2, 3, 4]$ | 16 | 100 | 0.8 | 0.22 |
| $[0, 3, 4, 5, 6]$ | 16 | 100 | 0.8 | 0.01 |
Increasing $l$ alone does not improve test performance
Neural networks are going to neural network—
we cannot take physical inductive biases for granted!
Qualitative embedding assessments may be required to get the best out of architectures
Need regularization/pre-training tasks that guarantee higher order terms to actually be used!
node_feats); 640-element embeddings
Decomposed projections without OOD samples
$l=1,3$ are largely unstructured, odd embedding placement
Decomposed projections with OOD samples
No semantic margin between in- and out-of-distribution smaples