Aug 25 - Collective Operations | Notion

https://pytorch.org/tutorials/intermediate/dist_tuto.html#collective-communication

https://pdc-support.github.io/introduction-to-mpi/07-collective/index.html

https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/collectives.html#reducescatter

Broadcast: same data is sent from root rank to all ranks

Scatter: data in sending buffer on root rank is split into chunks and each chunk is sent to different ranks

Gather: each rank sends data in their sending buffer to the root rank

AllGather: each rank sends data in their sending buffer to all other ranks

Reduce: Each rank sends a piece of data, which are combined on their way to root rank into a single piece of data

AllReduce: Same as Reduce but the result is sent to all ranks

ReduceScatter: Same as Reduce but the results are made across rows of ranks, and sent to other ranks