data-augmentation

The goal of augmentation is to add more training examples by generating synthetic data, with a particular emphasis on the fact that humans should still perform well but the algorithm does not.

Adversarial examples in the traditional sense probably don't fit super well under this definition.

Backlinks

data-centric-ai

Related to [[data-augmentation]] and [[training-notes]], we also want to ensure that the dataset used for training and testing is balanced. Random splits on real life imbalanced data is probably not going to do well, as we will end up with models biased against rare events.