It is a class of methods that combines the value-based method of Q-learning with the policy-based method of Monte Carlo policy gradients to improve the learning rate.
It is a class of methods that combines the value-based method of Q-learning with the policy-based method of Monte Carlo policy gradients to improve the learning rate.