A class of reinforcement learning algorithms that directly optimize the policy function without estimating the value function.
A class of reinforcement learning algorithms that directly optimize the policy function without estimating the value function.