The study of large collections of text or speech data, with the aim of uncovering patterns and regularities in language use.