The process of compiling and categorizing a corpus (i.e. a collection of texts) for use in linguistic analysis.