Policy Gradient Methods

A class of reinforcement learning algorithms that directly optimize the policy function without estimating the value function.