4-1. Example of sim_send() and sim_recv() in action:

// TODO: update the example with NS-3 Logs of Events being scheduled.

Once a collective communication operation is scheduled and run (in the respective collective algorithm), the algorithm calls try_register_event() which calls the system layer’s sim_send().

image-20240926-205107.png

In the picture above, should_schedule is important as it is a boolean that determines whether NS-3 should schedule this event or not. Node 0’s ALL_TO_ALL operation is then scheduled with NS-3’s schedule() function.

image-20240926-221857.png

After the event is scheduled, the event is split into 4 streams. The following is an example of how this works:

collective size: 24567 bytes
chunk_size: 6144 bytes (collective size / stream_num [4])
stream message size: 768 bytes (chunk_size / num_nodes) -> determined in algorithm

With the above message size, the ALL_TO_ALL operation is done on each node. The following output is the sim_sendand sim_recv seen in Node 1’s perspective (this is all printed sequentially).

[AllToAll::run] EventType General
[SIM_SEND] src_id 1 dest_id 2 message_size 768
[SIM_RECV] src_id 0 dest_id 1 message_size 768
[AllToAll::run] EventType General
[SIM_SEND] src_id 1 dest_id 3 message_size 768
[SIM_RECV] src_id 7 dest_id 1 message_size 768
[AllToAll::run] EventType General
[SIM_SEND] src_id 1 dest_id 4 message_size 768
[SIM_RECV] src_id 6 dest_id 1 message_size 768
[AllToAll::run] EventType General
[SIM_SEND] src_id 1 dest_id 5 message_size 768
[SIM_RECV] src_id 5 dest_id 1 message_size 768
[AllToAll::run] EventType General
[SIM_SEND] src_id 1 dest_id 6 message_size 768
[SIM_RECV] src_id 4 dest_id 1 message_size 768
[AllToAll::run] EventType General
[SIM_SEND] src_id 1 dest_id 7 message_size 768
[SIM_RECV] src_id 3 dest_id 1 message_size 768
[AllToAll::run] EventType General
[SIM_SEND] src_id 1 dest_id 0 message_size 768
[SIM_RECV] src_id 2 dest_id 1 message_size 768

Now take a look at the NS-3 layer and System layer’s interaction.

[NS-3 Entry.h] Notify Sender Sending Finished ... src_id: 4 dst_id: 5 message_size: 768
[Sys::handleEvent] sys id 5 with event PacketReceived
[StreamBaseline::consume] message is received and will run algorithm on current phase
[AllToAll::run] EventType PacketReceived
[SYSTEM - Register Event] sys id 5 with event MA_to_NPU should_schedule: 1
[AstraSimNetwork::sim_schedule] Calling NS-3 Schedule()!
[NS-3 Entry.h] Notify Receiver Receive Data ... src_id: 4 dst_id: 5 message_size: 768
  1. Node #4 sends collective message to Node #5
  2. Node #5 receives the data and notifies system about its retrieval. (not sure if the order of the print statements above is correct – have to check…)

There are some doubts regarding the parallelism of the collective algorithms because of the system tick stamps printed by setting ENABLE_TRACE to 1 in config.txt: