Contextual Bandits

It is a form of online learning where the agent needs to choose an action based on the context or state. It is a stateless problem, meaning the learner does not get the next state after action.