It is a model-free reinforcement learning algorithm for finding the optimal action-selection policy. The algorithm works by iteratively improving an estimate of the action-value function towards the optimal action-value function using the Bellman equation as a guiding principle.