dl-parallelism
Data parallelism
- Preprocessing with distributed computing; form a pipeline for data transformation and loading
- PyTorch
DataParallel
, and in particular,DistributedDataParallel
for multiple nodes
Model parallelism
- When a model is too big to fit on one GPU, we would need to split components onto different GPUs
- Basically, any
nn.Module
has.to(device)
run, and each part of theforward()
call would need to send its outputs to the next device as well. - PyTorch lightning should have good abstraction for this, so we don't have to explicitly pass data around
- Basically, any
Pipeline abstraction
In PyTorch, the abstraction using just PyTorch looks like this:
#needs-expanding on how PyTorch lightning high level stuff is encorporated into this.
Backlinks
computation-notes
- [[dl-parallelism]]
machine-learning-notes
- [[dl-parallelism]]