Analysis of an Alternate Policy Gradient Estimator for Softmax Policies

Shivam
Garg
Master's thesis
2021
University of Alberta
Mots-clefs: 
Reinforcement learning, policy gradient, softmax policy estimator, policy saturation, non-stationary environments, incremental learning