TY - JOUR
T1 - Reinforcement learning meets minority game
T2 - Toward optimal resource allocation
AU - Zhang, Si Ping
AU - Dong, Jia Qi
AU - Liu, Li
AU - Huang, Zi Gang
AU - Huang, Liang
AU - Lai, Ying-Cheng
N1 - Publisher Copyright: © 2019 American Physical Society.
PY - 2019/3/6
Y1 - 2019/3/6
N2 - The main point of this paper is to provide an affirmative answer through exploiting reinforcement learning (RL) in artificial intelligence (AI) for eliminating herding without any external control in complex resource allocation systems. In particular, we demonstrate that when agents are empowered with RL (e.g., the popular Q-learning algorithm in AI) in that they get familiar with the unknown game environment gradually and attempt to deliver the optimal actions to maximize the payoff, herding can effectively be eliminated. Furthermore, computations reveal the striking phenomenon that, regardless of the initial state, the system evolves persistently and relentlessly toward the optimal state in which all resources are used efficiently. However, the evolution process is not without interruptions: there are large fluctuations that occur but only intermittently in time. The statistical distribution of the time between two successive fluctuating events is found to depend on the parity of the evolution, i.e., whether the number of time steps in between is odd or even. We develop a physical analysis and derive mean-field equations to gain an understanding of these phenomena. Since AI is becoming increasingly widespread, we expect our RL empowered minority game system to have broad applications.
AB - The main point of this paper is to provide an affirmative answer through exploiting reinforcement learning (RL) in artificial intelligence (AI) for eliminating herding without any external control in complex resource allocation systems. In particular, we demonstrate that when agents are empowered with RL (e.g., the popular Q-learning algorithm in AI) in that they get familiar with the unknown game environment gradually and attempt to deliver the optimal actions to maximize the payoff, herding can effectively be eliminated. Furthermore, computations reveal the striking phenomenon that, regardless of the initial state, the system evolves persistently and relentlessly toward the optimal state in which all resources are used efficiently. However, the evolution process is not without interruptions: there are large fluctuations that occur but only intermittently in time. The statistical distribution of the time between two successive fluctuating events is found to depend on the parity of the evolution, i.e., whether the number of time steps in between is odd or even. We develop a physical analysis and derive mean-field equations to gain an understanding of these phenomena. Since AI is becoming increasingly widespread, we expect our RL empowered minority game system to have broad applications.
UR - http://www.scopus.com/inward/record.url?scp=85062819965&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062819965&partnerID=8YFLogxK
U2 - 10.1103/PhysRevE.99.032302
DO - 10.1103/PhysRevE.99.032302
M3 - Article
C2 - 30999513
SN - 2470-0045
VL - 99
JO - Physical Review E
JF - Physical Review E
IS - 3
M1 - 032302
ER -