Age of Information (AoI) is a recently proposed performance metric measuring the freshness of data at the receiving side of a flow. This metric is particularly suited to status-update type information flows, like those occurring in machine-type communication (MTC), remote monitoring and similar applications. In this paper, we consider the problem of AoI-optimal scheduling of multiple flows served by a single server. The performance of scheduling algorithms proposed in previous literature has been shown under limited assumptions, due to the analytical intractability of the problem. The goal of this paper is to apply reinforcement learning methods to achieve scheduling decisions that are resilient to network conditions and packet arrival processes. Specifically, Policy Gradients and Deep Q-Learning methods are employed. These can adapt to the network without a priori knowledge of its parameters. We study the resulting performance relative to a benchmark, the MAF algorithm, which is known to be optimal under certain conditions.


