ML Reviews

wandb-notes

Trying to conceptualize how to build in the Weights and Biases abstractions into standardized PyTorch workflows.

  • Version track data and models with Artifacts
  • Use Tables to do high level data and model visualization

Artifact abstraction

See this link for the explanation for how data is tracked, using a simple toy example.

Data is uploaded to a Google bucket; alternatively, reference artifacts is kind of like dvc and stores metadata, and acts a softlink. This can be done with something like this:

run = wandb.init()
artifact = wandb.Artifact('mnist', type='dataset')
artifact.add_reference('file:///mount/datasets/mnist/')
run.log_artifact(artifact)

The workflow does have some redundancy, however, in order to actually have the artifact tracking.

run = wandb.init()
artifact = run.use_artifact('mnist:latest', type='dataset')
artifact_dir = artifact.download()

For filesystem references, a download() operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of /mount/datasets/mnist will be copied into the directory artifacts/mnist:v0/. If an artifact contains a reference to a fail that was overwritten, then download() will throw an error as the artifact can no longer be reconstructed.