Corpus Linguistics and Translation

Home > Linguistics > Corpus linguistics > Corpus Linguistics and Translation

Applications of corpus linguistics in translation studies, such as using corpora to identify translation equivalents, evaluate translation quality, and support translator training.

Corpus Linguistics: The study of language through the analysis of large, structured collections of texts or spoken language.
Corpora: Collections of texts, often computer-readable, used in linguistic research.
Language Corpora: A body of texts or spoken language for the study of a particular language.
Translation Corpora: A body of texts used for the study of translation between two languages.
Parallel Corpora: A type of translation corpus that contains texts in two languages side-by-side, allowing for direct comparison.
Corpus Annotation: The process of adding linguistically relevant information to a corpus, such as part-of-speech tags, named entity tags, or discourse markers.
Corpus Querying: The process of searching a corpus for specific linguistic features or patterns.
Corpus Analysis: The study of language data derived from a corpus, including patterns of word use, textual organization, and discourse structure.
Corpus-based Translation Studies: The use of language corpora to study translation phenomena, such as translation universals or stylistic features.
Translation Evaluation: The process of assessing the quality of a translation using a variety of different measures.
Quantitative Analysis: A statistical approach to corpus-based research, often used to identify linguistic patterns or to compare different language varieties.
Qualitative Analysis: A non-statistical approach to corpus-based research, often used to describe linguistic phenomena in detail.
Machine Translation: The use of computer programs to automatically translate text from one language to another.
Translation Memory: A computer-aided translation tool that stores previously translated text segments for reuse.
Corpus-based Machine Translation: The use of corpus linguistics to inform the development of machine translation systems.
Language Resources: Databases, software, and other tools used for language analysis and natural language processing.
Natural Language Processing: A subfield of computer science that focuses on the development of computer programs that can understand and generate natural language.
Corpus Linguistics and Language Teaching: The use of corpus-based research in language teaching and learning.
Parallel Corpora: A parallel corpus is a collection of texts or sentences that exist in two or more languages. These are commonly used in Machine Translation (MT) to create translation models.
Comparable Corpora: Comparable corpora are collections of texts that are similar in nature, such as legal documents or news articles. They are used to compare language usage across domains, registers, and genres.
Monolingual Corpora: A monolingual corpus comprises texts in a single language. It is used to investigate language patterns, such as word frequencies and collocations, in a specific language.
Multi-modal Corpora: Multi-modal corpora involve the analysis of different types of data, such as text, audio, and video, using corpus linguistic methods. This is useful for understanding how language is used in different contexts, such as spoken vs. written, or in different situations.
Diachronic Corpora: Diachronic corpus involves a longitudinal study, collecting data over a period of time for analysing historical developments, changes in the use of language, patterns or shifts in semantics, and language evolution.
Raw Corpora: A raw corpus includes texts or speech samples that have not undergone any processing (e.g. normalization or segmentation). This type of corpus is useful for building language models or understanding spoken language.
Syntactically Annotated Corpora: A syntactically annotated corpus contains information on the grammatical structure of the text. This is useful for analysing syntactic patterns, such as dependency or constituency relations.
Semantically Annotated Corpora: A semantically annotated corpus has information on the meaning of the text. This is useful for understanding the use of words in different contexts.
Specialized Corpora: Specialized Corpora focuses on specialized subjects, such as medical or scientific subjects. This is useful for investigating the use of language in specific domains.
"Corpus linguistics is the study of a language as that language is expressed in its text corpus..."
"Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field..."
"The text-corpus method uses the body of texts written in any natural language to derive the set of abstract rules..."
"...collected in the field—the natural context ('realia') of that language..."
"Those results can be used to explore the relationships between that subject language and other languages..."
"The first such corpora were manually derived from source texts..."
"...but now that work is automated."
"Corpora have not only been used for linguistics research, they have also been used to compile dictionaries..."
"...starting with The American Heritage Dictionary of the English Language in 1969..."
"John McHardy Sinclair advocates minimal annotation so texts speak for themselves..."
"The Survey of English Usage team (University College, London) advocate annotation..."
"...as allowing greater linguistic understanding through rigorous recording."
"Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora..."
"The text-corpus method uses the body of texts written in any natural language to derive the set of abstract rules..."
"Those results can be used to explore the relationships between that subject language and other languages..."
"The first such corpora were manually derived from source texts..."
"...but now that work is automated."
"Corpora have not only been used for linguistics research, they have also been used to compile dictionaries..."
"...starting with The American Heritage Dictionary of the English Language in 1969..."
"Experts in the field have differing views about the annotation of a corpus..."