dl-parallelism

Data parallelism

Preprocessing with distributed computing; form a pipeline for data transformation and loading
PyTorch DataParallel, and in particular, DistributedDataParallel for multiple nodes

When a model is too big to fit on one GPU, we would need to split components onto different GPUs
- Basically, any nn.Module has .to(device) run, and each part of the forward() call would need to send its outputs to the next device as well.
- PyTorch lightning should have good abstraction for this, so we don't have to explicitly pass data around

In PyTorch, the abstraction using just PyTorch looks like this:

pytorch-distributed

#needs-expanding on how PyTorch lightning high level stuff is encorporated into this.

- [[dl-parallelism]]

- [[dl-parallelism]]