There are mainly two ways to achieve parallelism and scale your task in multiple servers:
- Model Parallelism: When your model does not fit on the GPU, you need to compute layers on different servers.
- Data Parallelism: When we have the same model distributed on different servers but handling different batches, so each server will have a different gradient and we need some sort of synchronization between the servers.
In this section, we will focus on data parallelism, which is simple for implementation: