Corpus Linguistics Software and Tools

Home > Linguistics > Corpus linguistics > Corpus Linguistics Software and Tools

A review of tools and software for managing, analyzing, and annotating corpora, including corpus search engines, concordancers, and text analysis programs.

Corpus compilation: The process of collecting, cleaning, and preparing a corpus for linguistics analysis.
Text annotation: The process of marking up a corpus with linguistic information, such as part-of-speech tags or syntactic structure.
Concordancing: The process of generating a list of occurrences of a particular word or phrase in a corpus, along with their context.
Collocation analysis: The examination of the co-occurrence patterns of words in a corpus to identify statistically significant relationships between them.
Semantic analysis: The examination of the meanings conveyed by words and phrases in a corpus, often using techniques such as word embeddings or topic modeling.
Discourse analysis: The study of patterns of language use at the level of whole texts or larger units, such as conversations or speeches.
Corpus-based grammar research: The use of corpus data to investigate structural patterns and grammatical rules in a language.
Error analysis: The examination of incorrect or non-standard language use in a corpus, often used in language teaching and learning research.
Corpus-based language learning: The use of corpora and corpus tools to support language learning and teaching.
Multilingual corpus analysis: The study of linguistic features across multiple languages, often used in language comparison research.
AntConc: A freeware corpus analysis toolkit for concordancing, exploring and extracting text data from corpora.
SketchEngine: An online corpus linguistics software which provides access to over 400 corpora and various search and analysis tools.
WordSmith Tools: A commercial software with various functions such as concordancing, collocation analysis, and frequency lists.
E-MAUS: A freely available software that offers a range of functions such as text search, collocation analysis and tagging.
TreeTagger: A free software application that allows users to tag texts with part-of-speech categories, lemmas, and grammatical functions.
GATE: A free software platform for natural language processing with various capabilities such as information extraction, sentiment analysis and machine learning.
CLAN: A software tool for analyzing conversation in naturalistic settings, designed for researchers in language acquisition, developmental psychology, and related fields.
TextSTAT: A freeware corpus analysis tool which supports multiple languages, lets users import corpora and carry out concordances, word frequency analyses and keyword searches.
Nooj: A software platform for natural language processing that includes various modules to handle tagging, parsing, morphology, and syntax.
OmegaWiki: A multilingual online dictionary that uses corpus linguistics for expanding and improving the content of its entries.