Controlling complexity by sharing parameters and minimizing variation

Philip

Bachman

Doctoral dissertation

2016

McGill University

Keywords:

supervised learning; deep learning; regularization

PBachman_PhD_Thesis_final.pdf

Summary:

In this thesis we investigate several variants of online learning problems with different feedback models and objectives. First we consider the pure exploration problem with multi-action probes. We design algorithms that can find the best one or several actions with high probability while using as few probes as possible. Then we study the side observation model in the regret minimization scenario. We derive a novel finite time distribution dependent lower bound and design asymptotically optimal and minimax optimal algorithms. Last we investigate the conservative bandit problem where the objective is to minimize the regret while maintaining the cumulative reward above a baseline. We design algorithms for several variants of the problem and derive a lower bound.