Regularized Reinforcement Learning with Performance Guarantees

Milani Fard
Doctoral dissertation
McGill University
Reinforcement learning, PAC-Bayes learning

Reinforcement learning covers a broad category of control problems in which the learning agent interacts with the environment in order to learn to maximize the collected utility. Such exploratory interaction is often costly, encouraging sample-efficient algorithms to be used in the process. This thesis explores two avenues that can help improve the sample complexity of such algorithms, one through prior domain knowledge on the dynamics or utilities, and the other by leveraging sparsity structures in the collected observations.
We take advantage of domain knowledge in the form of a prior distribution
to develop PAC-Bayesian regularized model-selection algorithms for the batch reinforcement learning problem, providing performance guarantees that hold regardless of the correctness of the prior distribution. We show how PAC-Bayesian policy evaluation can leverage prior distributions when they are informative and, unlike standard Bayesian approaches, ignore them when they are misleading.
In the absence of prior knowledge, we explore regularization of model-selection through random compressed sensing when generating features for the policy evaluation problem. In commonly occurring sparse observation spaces, such compression can help control the estimation error by substantially reducing the dimensionality of the regression space, at the cost of a small induced bias.
Our proposed methods can provably outperform the alternatives in sample or time complexity, showcasing how informed or agnostic regularization can further impact the effectiveness of reinforcement learning algorithms.