
Workload Layer: user defines and describes target DNN models, target parallelization strategies, and training loops → essentially where the real work is placed
System Layer: implements collective communication algorithms, schedules compute/communication operations, and manages compute-communication overlaps → essentially where scheduling of operations is done
Network API: communication times are computing using analytical or event-driven simulators (NS-3).
Within this code, the following init() is called. EventType::StreamInit is used as input to the respective simulation instance’s collective communication algorithm used (e.g. HalvingDoubling).
void StreamBaseline::init() {
  initialized = true;
  last_init = Sys::boostedTick();
  if (!my_current_phase.enabled) {
    return;
  }
  my_current_phase.algorithm->run(EventType::StreamInit, nullptr);
  if (steps_finished == 1) {
    queuing_delay.push_back(last_phase_change - creation_time);
  }
  queuing_delay.push_back(Sys::boostedTick() - last_phase_change);
  total_packets_sent = 1;
}
After initialization using EventType::StreamInit, the scheduler calls this again with EventType::Generalwhich prepares the stream->owner->front_end_sim_send() and stream->owner->front_end_sim_recv()functions. The following is the code within HalvingDoubling that starts the calls to sim_send or sim_recv:
void HalvingDoubling::run(EventType event, CallData* data) {
  if (event == EventType::General) {
    free_packets += 1;
    ready();
    iteratable();
  } else if (event == EventType::PacketReceived) {
    total_packets_received++;
    insert_packet(nullptr);
  } else if (event == EventType::StreamInit) {
    for (int i = 0; i < parallel_reduce; i++) {
      insert_packet(nullptr);
    }
  }
}
Some good resources: https://docs.google.com/document/d/14T4fAQe4d9dPq7dZEoEQ_dF6kSq0FaGlFlZLZdSSfx0/