It is another model-free reinforcement learning algorithm, that stands for State-Action-Reward-State-Action. It iteratively estimates the optimal action-value function as well as the corresponding policy by repeatedly following the current policy, observing the current state, taking an action based on the current policy and observing a reward and the resultant next state.