Analysis of an Alternate Policy Gradient Estimator for Softmax Policies

Shivam
Garg
Master's thesis
2021
University of Alberta
Keywords: 
Reinforcement learning, policy gradient, softmax policy estimator, policy saturation, non-stationary environments, incremental learning