The ASTRA-sim+NS-3 takes Charka ETs as input. The NS-3 determines the network topology with a node-based abstraction where each node refers to a device with networking capabilities. Chakra also uses nodes to specify a specific computation or communication event. Putting these together, an ET file includes all the events (communication and computations) that occur on a single compute node. The IDs in a ET file are for events not for physical compute nodes.
Chakra ET -> represents computation/communication done in a single device node
ㄴ COMPUTE NODE ID 49
ㄴ COMPUTE NODE ID 50
ㄴ ...
ㄴ "dataDeps": ["49"],
ㄴ ...
tldr; n Network Devices, then we have n ET files, with m compute/communicate events each
./ns3-dev-AstraSimNetwork-default \\
--workload-configuration=${WORKLOAD} \\
--system-configuration=${SYSTEM} \\
--network-configuration=${NETWORK} \\
--remote-memory-configuration=${MEMORY} \\
--logical-topology-configuration=${LOGICAL_TOPOLOGY} \\
--comm-group-configuration=\\"empty\\"
cd "${SCRIPT_DIR:?}"
workload-configuration: ET files that describe the computation and communication patterns
system-configuration: path to the system configuration file (e.g. Switch / Ring / Fully_Connected / Ring_FullyConnected_Switch)
Defines key parameters for scheduling and managing collective communication algorithms (such as AllReduce, AllGather, and ReduceScatter) in a multi-dimensional network.
{ "scheduling-policy": "LIFO",
"endpoint-delay": 10,
"active-chunks-per-dimension": 1,
"preferred-dataset-splits": 4,
"all-reduce-implementation": ["halvingDoubling"],
"all-gather-implementation": ["halvingDoubling"],
"reduce-scatter-implementation": ["halvingDoubling"],
"all-to-all-implementation": ["direct"],
"collective-optimization": "localBWAware",
"local-mem-bw": 50,
"boost-mode": 0
}
It includes options for scheduling policies (LIFO/FIFO), chunking strategies, and specific collective algorithm implementations (e.g., ring, direct, tree-based methods).
network-configuration: path to network configuration file (config.txt → holds the physical top. info)
remote-memory-configuration: path to memory configuration file
logical-topology-configuration: apart from physical configuration given in config.txt, we can also set the logical topology
1D use case (64 nodes 1D):
{"logical-dims": ["64"]}
2D use case (64 nodes in 2D):
{"logical-dims": ["8", "8"]}
comm-group-configuration: left empty (not sure)
I’ve put the generated ETs in the chakra directory just to run the build command without modification.
NS3_DIR="${SCRIPT_DIR:?}"/../../extern/network_backend/ns-3
WORKLOAD="${SCRIPT_DIR:?}"/../../extern/graph_frontend/chakra/one_comm_coll_node_allgather
SYSTEM="${SCRIPT_DIR:?}"/../../inputs/system/Switch.json
MEMORY="${SCRIPT_DIR:?}"/../../inputs/remote_memory/analytical/no_memory_expansion.json
LOGICAL_TOPOLOGY="${SCRIPT_DIR:?}"/../../inputs/network/ns3/sample_8nodes_1D.json
NETWORK="../../../ns-3/scratch/config/config.txt"
The above is the example inputs to the compiled ASTRA-sim + NS-3. It expects the workload ETs to be found in the chakra directory, so I’ve moved the generated ETs specific to all gather (one_comm_coll_node_allgather.*.et) to the chakra directory