Analysis of an Alternate Policy Gradient Estimator for Softmax Policies
Master's thesis
2021
University of Alberta
Mots-clefs:
Reinforcement learning, policy gradient, softmax policy estimator, policy saturation, non-stationary environments, incremental learning