Delay-optimal dynamic mode selection and resource allocation in device-to-device communications - part II: practical algorithm

Lei, Lei, Kuang, Yiru, Cheng, Nan, Shen, Xuemin, Zhong, Zhangdui, and Lin, Chuang (2016) Delay-optimal dynamic mode selection and resource allocation in device-to-device communications - part II: practical algorithm. IEEE Transactions on Vehicular Technology, 65 (5). pp. 3491-3505.

PDF (Published Version) - Published Version
Restricted to Repository staff only

DOI: 10.1109/TVT.2015.2444791

View at Publisher Website: https://doi.org/10.1109/TVT.2015.2444791

Abstract

In Part I of this paper (“Delay-Optimal Dynamic Mode Selection and Resource Allocation in Device-to-Device Communications-Part I: Optimal Policy”), we investigated dynamic mode selection and subchannel allocation for an orthogonal frequency-division multiple access (OFDMA) cellular network with device-to-device (D2D) communications to minimize the average end-to-end delay performance under the dropping probability constraint. We formulated the optimal resource control problem into an infinite-horizon average-reward constrained Markov decision process (CMDP), and the optimal control policy derived in Part I using the brute-force offline value iteration algorithm based on the reduced-state equivalent Bellman equation still faces the well-known curse-of-dimensionality problem, which limits its practical application in realistic scenarios with multiple D2D users and cellular users. In Part II of this paper, we use linear value approximation techniques to further reduce the state space. Moreover, an online stochastic learning algorithm with two timescales is applied to update the value functions and Lagrangian multipliers (LMs) based on the real-time observations of channel state information (CSI) and queue state information (QSI). The combined online stochastic learning solution converges almost surely to a global optimal solution under some realistic conditions. Simulation results show that the proposed approach achieves nearly the same performance as the offline value iteration algorithm and outperforms the conventional CSI-only scheme and throughput-optimal scheme in a stability sense.


Item ID:	53200
Item Type:	Article (Research - C1)
ISSN:	1939-9359
Keywords:	device-to-device communication; mode selection; online stochastic learning; resource allocation
Date Deposited:	19 Jun 2018 02:25
FoR Codes:	40 ENGINEERING > 4006 Communications engineering > 400608 Wireless communication systems and technologies (incl. microwave and millimetrewave) @ 100%
SEO Codes:	89 INFORMATION AND COMMUNICATION SERVICES > 8901 Communication Networks and Services > 890103 Mobile Data Networks and Services @ 100%
Downloads:	Total: 1
	More Statistics

Actions (Repository Staff Only)

Item Control Page