Orthogonality—the data being encoded into each dimension should ideally be as uncorrelated with others as possible. Think principal components, and interpretability.
Conventionally, one way we can obtain good embeddings that can at least partially fulfill these points is through contrastive learning. The idea behind contrastive metric learning is simple: your model learns to ascribe a margin/difference between like and unlike data, which can be achieved with both labeled or unlabeled data. The problem with contrastively learning is typically the computational cost: for each training example, we need to traverse the dataset to find contrastive examples of varying difficulty.
This approach takes an implicit approach to contrastive metric learning, decomposing the mechanisms to getting data similarity and orthogonality into variance, invariance, and covariance metrics as a regularization measure.
The unfiltered notes can be found here.