Atlas: Enabling Resilience in Virtualized RANs

Untitled

RAN Components:

RU (Radio Unit)
1. implemented in fixed-function hardware (FPGA)
2. connects to DU via fronthaul network
DU (Distributed Unit):
1. software application on commodity server
2. responsible for strict real-time processes → must run close to RUs
  1. PHY: wireless signal processing
  2. MAC: scheduling radio resources among UEs
3. hosts several cells (RUs)
CU (Centralized Unit):
1. software application on commodity server
2. delay-tolerant → can run further away from RUs
  1. RRC: manages radio-related UE operations including handovers
3. hosts hundreds of DUs
RIC (RAN Intelligent Controller): facilitates programmability of RAN

RU Resource Availability

Time Sharing → interchange between DU1 and DU2 packets
1. DU’s MAC scheduler makes scheduling decisions once every TTI
2. over the air signal exchanges happen once every symbol
3. there are multiple symbols within a single TTI frame
4. Problem: get near-zero throughput and UEs can’t even connect
Frequency Sharing → split frequency spectrum available to RU btw two DUs
1. Problem: BWP (Bandwidth Parts) allows this to happen but its an optional 5G feature that some UEs and vRAN software may not support
Spatial Sharing (Antenna Port Sharing) for control channel data

Existing Solutions for RAN Resilience:

Offloading UEs to neighbor cells
1. In traditional RAN, once a failure occurs to a DU:
  1. planned event (updates): handovers
  2. unplanned event (failures): UE reattaches to neighboring cells
2. Problems:
  1. Feasible when enough cell coverage is present
  2. For unplanned failures, UEs fail to reconnect reliably due to failure-agnostic nature of existing 5G protocols
  3. Ensuring good overlap of coverage requires maintenance windows (antena management, cell planning, etc.)
  4. UEs experience reduced throughput due to worse signal quality from neighbor
  5. vRAN datacenters are small → not always possible to place neighboring cells’ DUs on different servers within the datacenter
Fault-tolerant State Store
1. Store all states of DU for recovery
2. Problems:
  1. extensive modifications to DU source code needed → vendor specificity
  2. latency of fault-tolerant key-value stores takes significant portion of DU network functions’ TTI-sized processing budget
Creating a new DU instance
1. routes traffic to the new instance
2. Problem: not designed for real-time systems
Stateless migration to a hot-standby DU
1. PHY processing can be migrated to other PHY without transferring state because there is no long-lived states that affect the UEs
2. Problem: DU layers above the PHY (MAC/RLC) need to maintain long-lived UE states

ATLAS Overview

Untitled