Online Reinforcement Learning for Beam Tracking and Rate Adaptation in Millimeter-wave Systems

Marwan Krunz, Irmak Aykin, Sopan Sarkar, Berk Akgun

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


In this paper, we propose MAMBA, a restless multi-armed bandit framework for beam tracking in directional millimeter-wave (mmW) cellular systems. Instead of relying on explicit control messages, MAMBA utilizes the ACK/NACK packets transmitted by user equipments (UEs) to the base station (BS) as a part of the hybrid automatic repeat request (HARQ) procedure. These packets are used to measure the quality of the currently operating downlink beam, and select a new downlink beam along with an appropriate modulation and coding scheme (MCS) for future transmissions. At its core, MAMBA implements an online reinforcement learning technique called adaptive Thompson sampling (ATS), which determines a good beam and associated MCS to be used for the upcoming transmissions. To evaluate MAMBA&#x0027;s performance, we conduct extensive simulations and over-the-air (OTA) experiments over the 28 GHz band using phased-array antennas. We study fixed- as well as adaptive-rate variants of MAMBA, and contrast it with four other beam tracking strategies: a beam selection scheme similar to the one used in 5G NR (called &#x2018;static oracle&#x2019;), a theoretically optimal but practically infeasible beam tracking scheme (called &#x2018;dynamic oracle&#x2019;), an <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula>-greedy algorithm [1], and the Unimodal Beam Alignment (UBA) algorithm [2]. Our results show that MAMBA achieves 182&#x0025; throughput gain over the &#x2018;static oracle&#x2019; and is reasonably close to the throughput of the &#x2018;dynamic oracle&#x2019;. Compared to UBA, MAMBA achieves 25-35&#x0025; gain in throughput, depending on UE mobility. Finally, when operated at a fixed MCS, MAMBA/ATS achieves 21&#x0025; gain over the <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula>-greedy algorithm at the lowest applied MCS index, and 255&#x0025; gain at the highest MCS index.

Original languageEnglish (US)
Pages (from-to)1-16
Number of pages16
JournalIEEE Transactions on Mobile Computing
StateAccepted/In press - 2023


  • 5G mobile communication
  • Array signal processing
  • beam tracking
  • directional communications
  • Downlink
  • Indexes
  • Millimeter-wave
  • Mobile computing
  • multi-armed bandit
  • Optimization
  • reinforcement learning
  • Throughput

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Online Reinforcement Learning for Beam Tracking and Rate Adaptation in Millimeter-wave Systems'. Together they form a unique fingerprint.

Cite this