A type of deep learning architecture introduced in 2017 that has quickly become a standard approach for language modeling tasks. Transformers use attention mechanisms to selectively focus on different parts of input data and achieve state-of-the-art performance on a wide range of NLP tasks.