wandb-notes
Trying to conceptualize how to build in the Weights and Biases abstractions into standardized PyTorch workflows.
- Version track data and models with
Artifacts - Use
Tablesto do high level data and model visualization
Artifact abstraction
See this link for the explanation for how data is tracked, using a simple toy example.
Data is uploaded to a Google bucket; alternatively, reference artifacts is kind of like dvc and stores metadata, and acts a softlink. This can be done with something like this:
run = wandb.init()
artifact = wandb.Artifact('mnist', type='dataset')
artifact.add_reference('file:///mount/datasets/mnist/')
run.log_artifact(artifact)The workflow does have some redundancy, however, in order to actually have the artifact tracking.
run = wandb.init()
artifact = run.use_artifact('mnist:latest', type='dataset')
artifact_dir = artifact.download()For filesystem references, a
download()operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of/mount/datasets/mnistwill be copied into the directoryartifacts/mnist:v0/. If an artifact contains a reference to a fail that was overwritten, thendownload()will throw an error as the artifact can no longer be reconstructed.