Sept 2: Towards a Standardized Representation for Deep Learning Collective Algorithms

Problem: Collective optimizations have been done separately so far, and there is a need for co-designing distributed ML workloads with collective algorithms.

lack of standardization leads to downstream tools (simulation/executions) to rely on unique internal implementations
- ML workload operations and collective operations are optimized independently

Solution: Use a common representation (Chakra ET) for both ML Workloads and Collective Algorithms that can be used as ingest to simulators or executions

Benefits of Using Chakra ET:

Co-optimize collective communication with other workload-related operations
Interoperability across different tools (ASTRA-sim for simulations and MSCCL-Runtime for execution/validation of algorithms)
No more need to know per-simulator knowledge to implement collective algorithms

Background

Upstream Collective Algorithm Producers

MSCCLang - a domain specific language that enables users to construct NCCL-based collective algorithms → compiled to XL representation (MSCCL-IR) and run on MSCCL-Runtime

Downstream Distributed Machine Learning Tools

ASTRA-sim : simulator for distributed learning
MSCCL-Runtime: way to execute collective algorithms

Takeaways

With the capability to easily receive any collective algorithm in Chakra ET, downstream tools can expand the collective communication with provided algorithm