Analysis of an Alternate Policy Gradient Estimator for Softmax Policies
Master's thesis
2021
University of Alberta
Keywords:
Reinforcement learning, policy gradient, softmax policy estimator, policy saturation, non-stationary environments, incremental learning