Abstract
In this article, we propose a model-based reinforcement learning (RL) algorithm for wireless channel access. The model-based RL is a relatively new RL paradigm that integrates the concept of the world model into the agent. The world model is built based on the neural network and is capable of predicting the future trajectories of actions, rewards, and observations. In this article, we focus on developing a sophisticated world model based on the partially observable Markov decision process (POMDP). The proposed world model can describe the environment in which only the partial observation emitted from the hidden state is available. For establishing the wireless channel access problem, we introduce two separate environments, one of which describes the channel occupancy dynamics and the other governs data traffic arrival patterns. Both environments are modeled by the proposed partially observable MDP (POMDP)-based world model. For designing an agent capable of making a decision on the next action, we propose a planning algorithm, which makes use of the future trajectories generated from the trained world model differently from the existing model-free RL algorithms. We have conducted extensive simulations to verify the performance of the proposed method in various wireless channel access scenarios.
| Original language | English |
|---|---|
| Pages (from-to) | 10150-10167 |
| Number of pages | 18 |
| Journal | IEEE Internet of Things Journal |
| Volume | 11 |
| Issue number | 6 |
| DOIs | |
| State | Published - 15 Mar 2024 |
| Externally published | Yes |
Keywords
- Actor-critic
- model-based reinforcement learning (RL)
- partially observable Markov decision process (POMDP)
- wireless channel access
- world model