Analysis of an Alternate Policy Gradient Estimator for Softmax Policies

Shivam

Garg

Master's thesis

2021

University of Alberta

Keywords:

Reinforcement learning, policy gradient, softmax policy estimator, policy saturation, non-stationary environments, incremental learning