Search:
Go
English
Deutsch
Français
Japanese
Chinese Simplified
Chinese Traditional
Korean
Russian
Arabic
Česky
Greek
Italiano
Afrikaans
Aragonés
Armenian
Asturianu
Azerbaijani
Bahasa Indonesia
Bahasa Melayu
Bangla
Bashkir
Belarusian
Bosanski
Brezhoneg
Bulgarian
Català
Cymraeg
Dansk
Eesti
Español
Esperanto
Euskara
Frysk
Furlan
Føroyskt
Gaeilge
Gàidhlig
Galego
Gujarati
Hebrew
Hindi
Hrvatski
Interlingua
Íslenska
Kannada
Kaszëbsczi
Kazakh
Kiswahili
Kurdî
Kyrgyz
Latviski
Lëtzebuergesch
Lietuvių
Lingua Latina
Magyar
Makedonski
Marathi
Nederlands
Nordfriisk
Norsk
O'zbekcha
Occitan
Ossetian
Persian
Polski
Português
Punjabi Gurmukhi
Română
Rumantsch
Sardu
Seeltersk
Shqip
Sicilianu
Sinhala
Slovensko
Slovensky
Srpski
Suomi
Svenska
Tagalog
Taiwanese
Tamil
Tatarça
Telugu
Thai
Tiếng Việt
Türkçe
Türkmençe
Ukrainian
Urdu
Uyghurche
DMOZ Internet Directory
Presented by
DMOZLive.com
Home
About
Submit Site
Tweet
Home
Science
Social Sciences
Linguistics
Computational Linguistics
Corpus Analysis
19 Sites
The study of language through computerized corpora, or enormous samples of machine-readable text drawn from authentic language situations.
Categories
Tools
1 Sites
WordNet
3 Sites
Sites
[ Submit ]
LDC - Linguistic Data Consortium
- The Linguistic Data Consortium (LDC) creates, collects and distributes speech and text databases, annotated corpora, treebanks, lexicons and other linguistic resources for research, education and development.
A Logical Approach to Computational Corpus Linguistics
- A 1996 thesis by Torbjörn Lager. Abstract available, as well as full text in PostScript and PDF formats.
Shallow Processing of Large Corpora Workshop 2003
- Held at Lancaster University. Presented papers are available in PDF format.
Centre for Corpus Research
- At the University of Birmingham, England. Information on programmes, research and available resources.
Centre for English Corpus Linguistics
- At the Catholic University of Leuven, this institute focuses on cross-linguistic corpora and learner corpora. Research, events, staff, publications.
Hungarian National Corpus
- More than 150 million Hungarian words, a model of Hungarian language of the 1990s. Free and extensive query system. [Hungarian, English]
MRC Psycholinguistic Database
- Web access to a large database of linguistic and psycholinguistic (but not semantic) data derived from a variety of sources.
ELRA catalog of language resources
- Various language resources and evaluation packages in the field of Human Language Technology (HLT) are available at ELRA (European Language Resources Association). Distribution is taken care of by ELRA's operational body: ELDA.
National Corpus of Polish
- The National Corpus of Polish is a publicly available, large, balanced and linguistically annotated corpus of polish.
SIGANN: ACL Special Interest Group for Annotation
- A subgroup of the Association for Computational Linguistics (ACL), this group is concerned with all aspects of linguistic annotation of language resources (linguistic corpora), especially the advancement of interoperability. Sponsors the annual Linguistic Annotation Workshop (LAW).
SIGWAC: ACL Special Interest Group on Web as Corpus
- A subgroup of the Association for Computational Linguistics (ACL) which promotes interest in the use of the Internet as a source of linguistic data, and as an object of study in its own right. Organizes the WAC workshops.
Clitic climbing in electronic corpora
- Thesis study by Kertes Gábor that analyses the phenomenon of clitic climbing or clitic promotion. [Parallel Spanish and English]
Corpus Linguistics
- Online lessons intended to supplement the book by Tony McEnery and Andrew Wilson. Introductory information on the field.
British National Corpus
- The BNC is balanced synchronic text corpus containing 100 million words annotated with parts of speech.
Le Monde Diplomatique-Die Tageszeitung (LMD-TAZ) Parallel Corpus
- A French-German parallel corpus consisting of articles from Le Monde Diplomatique and die Tageszeitung, manually aligned and part-of-speech tagged.
Click
[ Submit ]
above to Add a New Site, Update a Site, or Remove a Site from this Category.
This directory is made available through a Creative Commons Attribution license from the
DMOZ Organization.