A collection of texts used for natural language processing research or training machine learning models.