The cloned GitHub directory does not contain example Chakra ETs for ASTRA-sim + NS-3, as shown in the error below:
We must generate some using the Chakra ET generator provided.
cd ~/astra-sim/extern/graph_frontend/chakra
Run the following script to generate example scripts. More information is found on the following link: chakra/src/generator/generator.py at main · mlcommons/chakra
python3 -m chakra.et_generator.et_generator --num_npus 8 --num_dims 1
et_generator.py [-h] [--num_npus NUM_NPUS] [--num_dims NUM_DIMS]
[--default_runtime DEFAULT_RUNTIME]
[--default_tensor_size DEFAULT_TENSOR_SIZE]
[--default_comm_size DEFAULT_COMM_SIZE]
--num_npus NUM_NPUS Number of NPUs
--num_dims NUM_DIMS Number of dimensions in the network topology
--default_runtime DEFAULT_RUNTIME
Default runtime of compute nodes
--default_tensor_size DEFAULT_TENSOR_SIZE
Default tensor size of memory nodes
--default_comm_size DEFAULT_COMM_SIZE
Default communication size of communication nodes
one_metadata_node_all_types(args.num_npus)
one_remote_mem_load_node(args.num_npus, args.default_tensor_size)
one_remote_mem_store_node(args.num_npus, args.default_tensor_size)
one_comp_node(args.num_npus, args.default_runtime)
two_comp_nodes_independent(args.num_npus, args.default_runtime)
two_comp_nodes_dependent(args.num_npus, args.default_runtime)
one_comm_coll_node_allreduce(args.num_npus, args.default_comm_size)
one_comm_coll_node_alltoall(args.num_npus, args.default_comm_size)
one_comm_coll_node_allgather(args.num_npus, args.default_comm_size)
one_comm_coll_node_reducescatter(args.num_npus, args.default_comm_size)
one_comm_coll_node_broadcast(args.num_npus, args.default_comm_size)
one_comm_coll_node_barrier(args.num_npus)
one_comm_send_node(args.num_npus, args.default_tensor_size)
one_comm_recv_node(args.num_npus, args.default_tensor_size)
This is a very simple ET that has no dependencies. It has few Chakra attributes (communication type (ALL_GATHER), is_cpu_op, communication size, and involved dimensions).
This is an all_gather node extracted from above converted into a json file:
{
"id": "80",
"name": "ALL_GATHER",
"type": "COMM_COLL_NODE",
"attr": [
{
"name": "is_cpu_op",
"boolVal": false
},
{
"name": "comm_type",
"int64Val": "2"
},
{
"name": "comm_size",
"uint64Val": "65536"
},
{
"name": "involved_dim",
"boolList": {
"values": [
true
]
}
}
]
}