training-notes

Here is a checklist of milestones for developing and testing model architectures: if we don't do well in one step, probably a good idea to go back one. This more or less pertains to the model part of the [[ml-ops]] lifecycle.

Synthetic data flows through the model (i.e. you receive an output without errors).
The output looks qualitatively correct; arrays in the expected shape, and there are no NaNs.
Real training data flows through the model (e.g. PyTorch Lightning does a batch of validation data).
Successfully iterate through a single training epoch.
The training loss goes down over several training epochs.
The training metric does not behave unexpectedly; for example NaN or negative for minimization.
Model is able to overfit a (small) training set.
Does well on the full training set.
Does well on dev/test sets.
Does well on application metrics and/or project goals.¹

Error analysis

One thing I haven't really done in the past, which might be helpful is to take smaller subsets of data and annotate tags of my own that might be helpful in diagnosing why a model performs poorly in any of checklist items.

Degraded model performance

Possible reasons why models perform worse in production:

Data drift; statistics change over different time frames that could be seasonal or unpredictable (e.g. pandemics). At a low level, feature importances could change over time.
Bad data collection, e.g. damaged sensors on cameras, etc.

Probably depends on who you ask, but the project goals should probably be defined well before the beginning of this list 😅↩

Backlinks

ml-ops

[^1]: See [[training-notes]]

error-analysis

A checklist for what kind of things to look out for when diagnosing models, related to [[training-notes]].

data-centric-ai

Related to [[data-augmentation]] and [[training-notes]], we also want to ensure that the dataset used for training and testing is balanced. Random splits on real life imbalanced data is probably not going to do well, as we will end up with models biased against rare events.