On Reinforcement Learning for Deep Neural Architectures : Conditional computation with stochastic computation policies

Master's thesis
McGill University
deep learning; reinforcement learning

When solving problems with machine learning, one must design models capable of representing good solutions, and search procedures to find them. A balance must be struck between the diversity of representable solutions and the amount of available data. With a flexible model and only a little data, incautious search may produce solutions which perform well on available data, but fall apart on new data. This thesis concerns methods for guiding search towards solutions which perform well with new data.

First, we estimate time-varying structure in sequential data. Though complex, we assume this structure can be described in terms of simple components, shared across time. We thus factor the problem into two parts: estimating the shared components, and using them to explain the data.

Second, we improve high-dimensional classification and regression by imposing the assumption that small changes in the input should produce small changes in the output. We estimate derivatives numerically, and search for solutions with small derivatives.

Third, we make models robust by encouraging their output to remain consistent in uncertain conditions. While our approach successfully leverages unstructured uncertainty, it could be more effective to use structured uncertainty tailored to each problem.

Our two final contributions started from attempts to develop data-adapted models of uncertainty, which evolved into a broader study of methods for controlling the complexity of generative models – i.e. models which capture the underlying structure of data by learning how to generate it.

We develop a generative model based on a stochastic dynamical system. We shape the system’s behavior by training another model to provide feedback about how the system deviates from desired behavior.

Finally, we present a formulation of data generation as sequential decision making. This provides a unified view of existing methods for generative modelling with deep neural networks, and may inspire new algorithms based on insights from reinforcement learning. We illustrate our formulation by developing a new method for data imputation based on guided policy search.

This thesis contributes new perspectives that unite existing ideas, and concrete algorithms for use with real data.