Kelvin Lee / Mikhail Galkin /

Santiago Miret

Intel Labs

Intro

Molecular symmetry

  • Atomistic systems display natural symmetry
  • Objects in Euclidean space can undergo indistinguishable transformations
  • Symmetry groups define basis of transformations
  • Equivariance guarantees commutative properties of these transforms

$SO(3)$ rotation group

$G_{36}^{\dagger}$ nuclear permutation group

$SO(3)$ equivariance simplifies physics and compute

Intro

Deconstructing equivariance

  • Tensor product formalism for preserving $SO(3)$ equivariance in neural representations implemented in e3nn
  • Points in 3D space embedded in a basis of spherical harmonics—only equivariance preserving transformations are allowed
  • Object ⇆ latents correspondence is highly desirable

Same framework as angular momentum coupling: equivariance preserving transformations are angular momentum conserving!

$$\psi_{lm} = \sum_{m_1m_2} C(l_1l_2l; m_1 m_2 m) \psi_{l_1m_1} \psi_{l_2m_2}$$

Clebsch-Gordon coefficients $C$ are non-zero only for specific combinations of angular momenta $l, l_1, l_2, m, m_1, m_2$

Intro

Deconstructing equivariance

$l=0,1,2$ have physical interpretations of scalar, vector, and tensor quantities

...but what do equivariant models actually learn?

How do higher angular momentum features affect modeling?


  1. Train a minimalist model on QM9 atomization energy prediction
  2. Decompose features into respective irreducible representations
  3. Project embedded structures in an interpretable way

Methods

Decoupling spherical harmonics

Symbolic reductions using sympy

WYSIWYG open-source kernel implementations up to $l=10$ with TritonLang

Portable performance across multiple hardware architectures

Methods

A toy equivariant model

Three interaction blocks yield node embeddings with $2l_\mathrm{max} + 1$ hidden features

Methods

Low-dimensional projections

  • Potential of heat-diffusion for Affinity-based transition embedding (PHATE) method for projecting high-dimensional latents for visualization
  • Local relationships between data points captured through local Gaussian kernel
  • Global relationships between data points modeled as diffusion probability
Moon et al. (2019); Visualizing structure and transitions in high-dimensional biological data

Results

QM9 performance—high angular momentum

Configuration Hidden dimension # of parameters (M) Test error (eV) ↓
$[0, 1, 2]$ 32 1.6 0.12
$[0, 1, 2, 4]$ 32 3.0 0.15
$[0, 1, 2, 6]$ 32 2.6 0.21
$[0, 1, 2, 8]$ 32 2.6 0.19
$[0, 1, 2, 10]$ 32 2.6 0.19

Adding higher $l$ alone does not improve test performance

Results

PHATE analysis

Points colored by molecular complexity (spacial score)

$l=1-3$ latents are largely unstructured and unused—test error 0.73 eV

Results

PHATE analysis

Ablating orders improves structure and test performance—test error 0.52 eV

Results

QM9 performance—larger sets

Configuration Hidden dimension Epochs # of parameters (M) Test error (eV) ↓
$[0, 1, 2, 3, 4]$ 16 30 0.8 1.24
$[0, 1, 2, 3, 4, 5, 6]$ 16 30 1.9 0.73
$[0, 1, 2, 3, 4, 5, 6, 7, 8]$ 16 30 3.7 1.02
$[0, 3, 4, 5, 6]$ 16 30 0.8 0.52
$[0, 1, 2, 3, 4]$ 16 100 0.8 0.22
$[0, 3, 4, 5, 6]$ 16 100 0.8 0.01

Increasing $l$ alone does not improve test performance

Conclusions

What does it mean?

  • Unused irreducible representations actually hinder quantitative performance
  • Are equivariant features actually being used?
  • $l=0$ dominates embeddings used for prediction—vector ($l=1$) and tensor ($l=2$) features should be able to do more
  • Embeddings need discoverable, interpretable structure to behave predictably in deployment—to the limit of linear models

Conclusions

Why does it matter?

Neural networks are going to neural network—

we cannot take physical inductive biases for granted!

Qualitative embedding assessments may be required to get the best out of architectures

Need regularization/pre-training tasks that guarantee higher order terms to actually be used!

Future work

What about “real” models and data?

  • Analysis on 10,000 random Materials Project Trajectory samples
  • Pre-trained MACE “medium”
  • Black squares correspond to 1000 out-of-distribution data (🪧 #63)
  • PHATE projections of scalar embeddings (node_feats); 640-element embeddings

Future work

What about “real” models and data?

Decomposed projections without OOD samples

$l=1,3$ are largely unstructured, odd embedding placement

Future work

What about “real” models and data?

Decomposed projections with OOD samples

No semantic margin between in- and out-of-distribution smaples

Future work

Call to action

  • Equivariant models like MACE may have untapped modeling performance
  • Need qualitative, as well as quantitative comparisons to guide data, model, and task design
  • Deeper insight into learning dynamics of tensor product models

Acknowledgements

  • John Pennycook
    Kernel suggestions
  • Xiaoxiao (Lory) Wang
    OOD dopant data samples (🪧 #63)
  • Mikhail Galkin
    Writing, code review, direction
  • Santiago Miret
    Writing, direction