Details for Running ASTRA-sim+NS-3

1. NS-3 “Nodes” and Chakra “Nodes”

The ASTRA-sim+NS-3 takes Charka ETs as input. The NS-3 determines the network topology with a node-based abstraction where each node refers to a device with networking capabilities. Chakra also uses nodes to specify a specific computation or communication event. Putting these together, an ET file includes all the events (communication and computations) that occur on a single compute node. The IDs in a ET file are for events not for physical compute nodes.

Chakra ET -> represents computation/communication done in a single device node
  ㄴ COMPUTE NODE ID 49
  ㄴ COMPUTE NODE ID 50
    ㄴ ...
    ㄴ "dataDeps": ["49"],
    ㄴ ...

tldr; n Network Devices, then we have n ET files, with m compute/communicate events each


Input Files to ASTRA-sim + NS-3

./ns3-dev-AstraSimNetwork-default \\
    --workload-configuration=${WORKLOAD} \\
    --system-configuration=${SYSTEM} \\
    --network-configuration=${NETWORK} \\
    --remote-memory-configuration=${MEMORY} \\
    --logical-topology-configuration=${LOGICAL_TOPOLOGY} \\
    --comm-group-configuration=\\"empty\\"
cd "${SCRIPT_DIR:?}"

Examples

Example 1: Generated All Gather

I’ve put the generated ETs in the chakra directory just to run the build command without modification.

NS3_DIR="${SCRIPT_DIR:?}"/../../extern/network_backend/ns-3
WORKLOAD="${SCRIPT_DIR:?}"/../../extern/graph_frontend/chakra/one_comm_coll_node_allgather
SYSTEM="${SCRIPT_DIR:?}"/../../inputs/system/Switch.json
MEMORY="${SCRIPT_DIR:?}"/../../inputs/remote_memory/analytical/no_memory_expansion.json
LOGICAL_TOPOLOGY="${SCRIPT_DIR:?}"/../../inputs/network/ns3/sample_8nodes_1D.json
NETWORK="../../../ns-3/scratch/config/config.txt"

The above is the example inputs to the compiled ASTRA-sim + NS-3. It expects the workload ETs to be found in the chakra directory, so I’ve moved the generated ETs specific to all gather (one_comm_coll_node_allgather.*.et) to the chakra directory