A family of methods that estimate the value function or policy by interpolating between the current estimate and the reward observed in the next state.
A family of methods that estimate the value function or policy by interpolating between the current estimate and the reward observed in the next state.