It is a method of reinforcement learning where the learning algorithm directly learns the policy function that maps the state to the action that can be taken under that state to maximize an expected reward.
It is a method of reinforcement learning where the learning algorithm directly learns the policy function that maps the state to the action that can be taken under that state to maximize an expected reward.