Background Information

RAN Components:
- RU (Radio Unit)
- implemented in fixed-function hardware (FPGA)
- connects to DU via fronthaul network
- DU (Distributed Unit):
- software application on commodity server
- responsible for strict real-time processes → must run close to RUs
- PHY: wireless signal processing
- MAC: scheduling radio resources among UEs
- hosts several cells (RUs)
- CU (Centralized Unit):
- software application on commodity server
- delay-tolerant → can run further away from RUs
- RRC: manages radio-related UE operations including handovers
- hosts hundreds of DUs
- RIC (RAN Intelligent Controller): facilitates programmability of RAN
RU Resource Availability
- Time Sharing → interchange between DU1 and DU2 packets
- DU’s MAC scheduler makes scheduling decisions once every TTI
- over the air signal exchanges happen once every symbol
- there are multiple symbols within a single TTI frame
- Problem: get near-zero throughput and UEs can’t even connect
- Frequency Sharing → split frequency spectrum available to RU btw two DUs
- Problem: BWP (Bandwidth Parts) allows this to happen but its an optional 5G feature that some UEs and vRAN software may not support
- Spatial Sharing (Antenna Port Sharing) for control channel data
Existing Solutions for RAN Resilience:
- Offloading UEs to neighbor cells
- In traditional RAN, once a failure occurs to a DU:
- planned event (updates): handovers
- unplanned event (failures): UE reattaches to neighboring cells
- Problems:
- Feasible when enough cell coverage is present
- For unplanned failures, UEs fail to reconnect reliably due to failure-agnostic nature of existing 5G protocols
- Ensuring good overlap of coverage requires maintenance windows (antena management, cell planning, etc.)
- UEs experience reduced throughput due to worse signal quality from neighbor
- vRAN datacenters are small → not always possible to place neighboring cells’ DUs on different servers within the datacenter
- Fault-tolerant State Store
- Store all states of DU for recovery
- Problems:
- extensive modifications to DU source code needed → vendor specificity
- latency of fault-tolerant key-value stores takes significant portion of DU network functions’ TTI-sized processing budget
- Creating a new DU instance
- routes traffic to the new instance
- Problem: not designed for real-time systems
- Stateless migration to a hot-standby DU
- PHY processing can be migrated to other PHY without transferring state because there is no long-lived states that affect the UEs
- Problem: DU layers above the PHY (MAC/RLC) need to maintain long-lived UE states
ATLAS Overview
