https://pytorch.org/tutorials/intermediate/dist_tuto.html#collective-communication

https://pdc-support.github.io/introduction-to-mpi/07-collective/index.html

https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/collectives.html#reducescatter

  1. Broadcast: same data is sent from root rank to all ranks

image.png

  1. Scatter: data in sending buffer on root rank is split into chunks and each chunk is sent to different ranks

image.png

  1. Gather: each rank sends data in their sending buffer to the root rank

image.png

  1. AllGather: each rank sends data in their sending buffer to all other ranks

image.png

  1. Reduce: Each rank sends a piece of data, which are combined on their way to root rank into a single piece of data

image.png

  1. AllReduce: Same as Reduce but the result is sent to all ranks

image.png

  1. ReduceScatter: Same as Reduce but the results are made across rows of ranks, and sent to other ranks

image.png