You are here


Controlling complexity by sharing parameters and minimizing variation

Doctoral dissertation
McGill University
supervised learning; deep learning; regularization

In this thesis we investigate several variants of online learning problems with different feedback models and objectives. First we consider the pure exploration problem with multi-action probes. We design algorithms that can find the best one or several actions with high probability while using as few probes as possible. Then we study the side observation model in the regret minimization scenario. We derive a novel finite time distribution dependent lower bound and design asymptotically optimal and minimax optimal algorithms. Last we investigate the conservative bandit problem where the objective is to minimize the regret while maintaining the cumulative reward above a baseline. We design algorithms for several variants of the problem and derive a lower bound.

Follow us on social media
twitter  twitter  LinkedIN



2013 AI Doctoral and MSC thesis awards


33rd Canadian Conference on Artificial Intelligence

12-15 May 2020, Online



The Caiac Herald

Nov. 2019 edition now available!

Member login