A type of machine learning algorithm where the robot learns by receiving rewards and punishments based on its actions.