Sequential precoding in cell-free MIMO: a multi-agent reinforcement learning approach

2025-11-26
Uludere, Dilay İrem
Cell-free massive MIMO is increasingly recognized as a foundational technology for upcoming 6G networks, as it enables uniform service coverage and improved spectral efficiency by coordinating a large number of distributed access points (APs). One of the most promising implementations of this concept is the radio stripes architecture, which supports cost-effective and scalable AP deployment along distributed cables that integrate both power and fronthaul connections. Nevertheless, fully centralized precoding schemes such as MMSE require heavy channel state information (CSI) exchange and centralized computations, which limit scalability under stringent fronthaul constraints. To address these limitations, we introduce a sequential precoding framework that exploits multi-agent reinforcement learning (MARL) to optimize the system sum rate while respecting transmit power constraints. In this approach, each AP is treated as an independent learning agent capable of designing its own precoder in a sequential manner. We adapt the unidirectional information-sharing mechanism from the team minimum mean-squared error (TMMSE) strategy to a MARL setting through the multi-agent deep deterministic policy gradient (MADDPG) algorithm, enabling agents to make decisions in a distributed yet coordinated fashion. In addition, we develop a simplified information-passing mechanism between APs that lowers communication overhead without compromising performance, and we allow the reinforcement learning agents to autonomously determine the most informative content to exchange, resulting in an adaptive and optimized inter-AP communication process. Extensive simulations have been conducted to evaluate the proposed framework against established benchmarks such as classical MMSE, zero-forcing (ZF), single-agent deep deterministic policy gradient (DDPG), and the original TMMSE method. The results demonstrate that our MARL-based approach achieves competitive or superior sum-rate performance while being inherently suited for practical radio stripes deployments.
Citation Formats
D. İ. Uludere, “Sequential precoding in cell-free MIMO: a multi-agent reinforcement learning approach,” M.S. - Master of Science, Middle East Technical University, 2025.