Deconstructing equivariant representations in molecular systems

Intro

Molecular symmetry

Atomistic systems display natural symmetry
Objects in Euclidean space can undergo indistinguishable transformations
Symmetry groups define basis of transformations
Equivariance guarantees commutative properties of these transforms

$SO(3)$ rotation group

$G_{36}^{\dagger}$ nuclear permutation group

$SO(3)$ equivariance simplifies physics and compute

Intro

Deconstructing equivariance

Tensor product formalism for preserving $SO(3)$ equivariance in neural representations implemented in e3nn
Points in 3D space embedded in a basis of spherical harmonics—only equivariance preserving transformations are allowed
Object ⇆ latents correspondence is highly desirable

Same framework as angular momentum coupling: equivariance preserving transformations are angular momentum conserving!

$$\psi_{lm} = \sum_{m_1m_2} C(l_1l_2l; m_1 m_2 m) \psi_{l_1m_1} \psi_{l_2m_2}$$

Clebsch-Gordon coefficients $C$ are non-zero only for specific combinations of angular momenta $l, l_1, l_2, m, m_1, m_2$

Intro

Deconstructing equivariance

$l=0,1,2$ have physical interpretations of scalar, vector, and tensor quantities

...but what do equivariant models actually learn?

How do higher angular momentum features affect modeling?

Train a minimalist model on QM9 atomization energy prediction
Decompose features into respective irreducible representations
Project embedded structures in an interpretable way

Methods

Decoupling spherical harmonics

Symbolic reductions using sympy

WYSIWYG open-source kernel implementations up to $l=10$ with TritonLang

Portable performance across multiple hardware architectures

github.com/IntelLabs/EquiTriton

Methods

A toy equivariant model

Three interaction blocks yield node embeddings with $2l_\mathrm{max} + 1$ hidden features

Methods

Low-dimensional projections

Potential of heat-diffusion for Affinity-based transition embedding (PHATE) method for projecting high-dimensional latents for visualization
Local relationships between data points captured through local Gaussian kernel
Global relationships between data points modeled as diffusion probability

Moon *et al.* (2019); Visualizing structure and transitions in high-dimensional biological data

Results

QM9 performance—high angular momentum

Configuration	Hidden dimension	# of parameters (M)	Test error (eV) ↓
$[0, 1, 2]$	32	1.6	0.12
$[0, 1, 2, 4]$	32	3.0	0.15
$[0, 1, 2, 6]$	32	2.6	0.21
$[0, 1, 2, 8]$	32	2.6	0.19
$[0, 1, 2, 10]$	32	2.6	0.19

Adding higher $l$ alone does not improve test performance

Results

PHATE analysis

Points colored by molecular complexity (spacial score)

$l=1-3$ latents are largely unstructured and unused—**test error 0.73 eV**

Results

PHATE analysis

Ablating orders improves structure and test performance—**test error 0.52 eV**

Results

QM9 performance—larger sets

Configuration	Hidden dimension	Epochs	# of parameters (M)	Test error (eV) ↓
$[0, 1, 2, 3, 4]$	16	30	0.8	1.24
$[0, 1, 2, 3, 4, 5, 6]$	16	30	1.9	0.73
$[0, 1, 2, 3, 4, 5, 6, 7, 8]$	16	30	3.7	1.02
$[0, 3, 4, 5, 6]$	16	30	0.8	0.52
$[0, 1, 2, 3, 4]$	16	100	0.8	0.22
$[0, 3, 4, 5, 6]$	16	100	0.8	0.01

Increasing $l$ alone does not improve test performance

Conclusions

What does it mean?

Unused irreducible representations actually hinder quantitative performance
Are equivariant features actually being used?
$l=0$ dominates embeddings used for prediction—vector ($l=1$) and tensor ($l=2$) features should be able to do more
Embeddings need discoverable, interpretable structure to behave predictably in deployment—to the limit of linear models

Conclusions

Why does it matter?

Neural networks are going to neural network—

we cannot take physical inductive biases for granted!

Qualitative embedding assessments may be required to get the best out of architectures

Need regularization/pre-training tasks that guarantee higher order terms to actually be used!

Future work

What about “real” models and data?

Analysis on 10,000 random Materials Project Trajectory samples
Pre-trained MACE “medium”
Black squares correspond to 1000 out-of-distribution data (🪧 #63)
PHATE projections of scalar embeddings (node_feats); 640-element embeddings

Future work

What about “real” models and data?

Decomposed projections without OOD samples

$l=1,3$ are largely unstructured, odd embedding placement

Future work

What about “real” models and data?

Decomposed projections with OOD samples

No semantic margin between in- and out-of-distribution smaples

Future work

Call to action

Equivariant models like MACE may have untapped modeling performance
Need qualitative, as well as quantitative comparisons to guide data, model, and task design
Deeper insight into learning dynamics of tensor product models

Acknowledgements

John Pennycook
Kernel suggestions
Xiaoxiao (Lory) Wang
OOD dopant data samples (🪧 #63)
Mikhail Galkin
Writing, code review, direction
Santiago Miret
Writing, direction