wandb-notes
Trying to conceptualize how to build in the Weights and Biases abstractions into standardized PyTorch workflows.
- Version track data and models with
Artifacts
- Use
Tables
to do high level data and model visualization
Artifact
abstraction
See this link for the explanation for how data is tracked, using a simple toy example.
Data is uploaded to a Google bucket; alternatively, reference artifacts is kind of like dvc
and stores metadata, and acts a softlink. This can be done with something like this:
run = wandb.init()
artifact = wandb.Artifact('mnist', type='dataset')
artifact.add_reference('file:///mount/datasets/mnist/')
run.log_artifact(artifact)
The workflow does have some redundancy, however, in order to actually have the artifact tracking.
run = wandb.init()
artifact = run.use_artifact('mnist:latest', type='dataset')
artifact_dir = artifact.download()
For filesystem references, a
download()
operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of/mount/datasets/mnist
will be copied into the directoryartifacts/mnist:v0/
. If an artifact contains a reference to a fail that was overwritten, thendownload()
will throw an error as the artifact can no longer be reconstructed.