Using Large Corpora

Using Large Corpora Book in PDF, ePub and Kindle version is available to download in english. Read online anytime anywhere directly from your device. Click on the download button below to get a free pdf file of Using Large Corpora book. This book definitely worth reading, it is an incredibly well-written.

Using Large Corpora

Author : Armstrong-Warwick Armstrong
Publisher : MIT Press
Page : 364 pages
File Size : 53,9 Mb
Release : 1994
Category : Business & Economics
ISBN : 0262510820

Get Book

Using Large Corpora by Armstrong-Warwick Armstrong Pdf

Using Large Corpora identifies new data-oriented methods for organizing and analyzing large corpora and describes the potential results that the use of large corpora offers. Today, large corpora consisting of hundreds of millions or even billions of words, along with new empirical and statistical methods for organizing and analyzing these data, promise new insights into the use of language. Already, the data extracted from these large corpora reveal that language use is more flexible and complex than most rule-based systems have tried to account for, providing a basis for progress in the performance of Natural Language Processing systems. Using Large Corpora identifies these new data-oriented methods and describes the potential results that the use of large corpora offers. The research described shows that the new methods may offer solutions to key issues of acquisition (automatically identifying and coding information), coverage (accounting for all of the phenomena in a given domain), robustness (accommodating real data that may be corrupt or not accounted for in the model), and extensibility (applying the model and data to a new domain, text, or problem). There are chapters on lexical issues, issues in syntax, and translation topics, as well discussions of the statistics-based vs. rule-based debate. ACL-MIT Series in Natural Language Processing.

Natural Language Processing Using Very Large Corpora

Author : S. Armstrong,Kenneth W. Church,Pierre Isabelle,Sandra Manzi,Evelyne Tzoukermann,David Yarowsky
Publisher : Springer Science & Business Media
Page : 314 pages
File Size : 44,5 Mb
Release : 2013-04-17
Category : Language Arts & Disciplines
ISBN : 9789401723909

Get Book

Natural Language Processing Using Very Large Corpora by S. Armstrong,Kenneth W. Church,Pierre Isabelle,Sandra Manzi,Evelyne Tzoukermann,David Yarowsky Pdf

ABOUT THIS BOOK This book is intended for researchers who want to keep abreast of cur rent developments in corpus-based natural language processing. It is not meant as an introduction to this field; for readers who need one, several entry-level texts are available, including those of (Church and Mercer, 1993; Charniak, 1993; Jelinek, 1997). This book captures the essence of a series of highly successful work shops held in the last few years. The response in 1993 to the initial Workshop on Very Large Corpora (Columbus, Ohio) was so enthusias tic that we were encouraged to make it an annual event. The following year, we staged the Second Workshop on Very Large Corpora in Ky oto. As a way of managing these annual workshops, we then decided to register a special interest group called SIGDAT with the Association for Computational Linguistics. The demand for international forums on corpus-based NLP has been expanding so rapidly that in 1995 SIGDAT was led to organize not only the Third Workshop on Very Large Corpora (Cambridge, Mass. ) but also a complementary workshop entitled From Texts to Tags (Dublin). Obviously, the success of these workshops was in some measure a re flection of the growing popularity of corpus-based methods in the NLP community. But first and foremost, it was due to the fact that the work shops attracted so many high-quality papers.

Using Corpora in Discourse Analysis

Author : Paul Baker
Publisher : Bloomsbury Publishing
Page : 218 pages
File Size : 45,6 Mb
Release : 2023-08-24
Category : Language Arts & Disciplines
ISBN : 9781350083769

Get Book

Using Corpora in Discourse Analysis by Paul Baker Pdf

How can you carry out discourse analysis using corpus linguistics? What research questions should I ask? Which methods should you use and when? What is a collocational network or a key cluster? Introducing the major techniques, methods and tools for corpus-assisted analysis of discourse, this book answers these questions and more, showing readers how to best use corpora in their analyses of discourse. Using carefully tailored case studies, each chapter is devoted to a central technique, including frequency, concordancing and keywords, going step by step through the process of applying different analytical procedures. Introducing a wide range of different corpora, from holiday brochures to political debates, the book considers the key debates and latest advances in the field. Fully revised and updated, this new edition includes: - A new chapter on how to conduct research projects in corpus-based discourse analysis - Completely rewritten chapters on collocation and advanced techniques, using a corpus of jihadist propaganda texts and covering topics such as social media and visual analysis - Coverage of major tools, including CQPweb, AntConc, Sketch Engine and #LancsBox - Discussion of newer techniques including the derivation of lockwords and the comparison of multiple data sets for diachronic analysis With exercises, discussion questions and suggested further readings in each chapter, this book is an excellent guide to using corpus linguistics techniques to carry out discourse analysis.

Corpora and Language Learners

Author : Guy Aston,Silvia Bernardini,Dominic Stewart
Publisher : John Benjamins Publishing
Page : 326 pages
File Size : 55,6 Mb
Release : 2004-01-01
Category : Language Arts & Disciplines
ISBN : 9027222886

Get Book

Corpora and Language Learners by Guy Aston,Silvia Bernardini,Dominic Stewart Pdf

Corpus-aided language pedagogy is one of the central application areas of corpus methodologies, and a test bed for theories of language and learning. This volume provides an overview of current trends, offering methodological and theoretical position statements along with results from empirical studies. The relationship between corpora and learning is examined from complementary perspectives — the study of learner language, the didactic use of corpus findings, and the interaction between corpora and their users. Reflections on current theory and technology open and close the volume.With its focus on the learner and the learning setting, Corpora and Language Learners is addressed to corpus linguists with an interest in learner language, applied linguists wishing to expand their understanding of corpora and their pedagogic potential, and language teachers wishing to critically assess the relevance of work in this field. This volume grew out of selected presentations at the 5th Teaching and Language Corpora conference in Bertinoro, Italy.

Exploring English with Online Corpora

Author : Wendy Anderson,John Corbett
Publisher : Bloomsbury Publishing
Page : 242 pages
File Size : 47,5 Mb
Release : 2017-09-16
Category : Language Arts & Disciplines
ISBN : 9781137438102

Get Book

Exploring English with Online Corpora by Wendy Anderson,John Corbett Pdf

This is an essential guide to using digital resources in the study of English language and linguistics. Assuming no prior experience, it introduces the fundamentals of online corpora and equips readers with the skills needed to search and interpret corpus data. Later chapters focus on specific elements of linguistic analysis, namely vocabulary, grammar, discourse and pronunciation. Examples from five major online corpora illustrate key issues to consider in corpus analysis, while case studies and activities help students get to grips with the wide range of resources that are available and select those that best suit their needs. Perfect for students of corpus linguistics and applied linguistics, this engaging and accessible guide opens the door to an ever-expanding world of online resources. It is also ideal for anyone who is curious about how the English language works and has a desire to explore its many written and spoken forms. New to this Edition: - Fully revised and updated throughout, incorporating the latest developments in corpus linguistics - Expanded material on corpora in teaching, contextualising corpus texts and critical discourse analysis

Corpus-based Language Studies

Author : Tony McEnery,Richard Xiao,Yukio Tono
Publisher : Taylor & Francis
Page : 412 pages
File Size : 43,7 Mb
Release : 2006
Category : Foreign Language Study
ISBN : 0415286220

Get Book

Corpus-based Language Studies by Tony McEnery,Richard Xiao,Yukio Tono Pdf

Covering the major approaches to the use of corpus data, this work gathers together influential readings from leading names in the discipline, including Biber, Widdowson, Sinclair, Carter and McCarthy.

Learning with Corpora

Author : Guy Aston
Publisher : Athelstan
Page : 290 pages
File Size : 41,6 Mb
Release : 2001
Category : Education
ISBN : 0940753162

Get Book

Learning with Corpora by Guy Aston Pdf

This book covers the use of corpora in language learning and translation. Chapters include: Learning with corpora: an overview; Corpora and their uses in language research; Corpus-based description in teaching and learning; The pedagogic use of spoken corpora; The learner as researcher; Integrating corpus work into an academic reading course; Swimming in words; Going to the Clochemerle; 'Spoilt for choice': a learner explores general language corpora.

Working with Portuguese Corpora

Author : Tony Berber Sardinha,Telma de Lurdes São Bento Ferreira
Publisher : A&C Black
Page : 347 pages
File Size : 40,9 Mb
Release : 2014-04-10
Category : Language Arts & Disciplines
ISBN : 9781472570017

Get Book

Working with Portuguese Corpora by Tony Berber Sardinha,Telma de Lurdes São Bento Ferreira Pdf

Although Portuguese is one of the main world languages and researchers have been working on Portuguese electronic text collections for decades (e.g. Kelly, 1970; Biderman, 1978; Bacelar do Nascimento et al., 1984; see Berber Sardinha, 2005), this is the first volume in English that encapsulates the exciting and cutting-edge corpus linguistic work being done with Portuguese language corpora on different continents. The book includes chapters by leading corpus linguists dealing with Portuguese corpora across the world, and their contributions explore various methods and how they are applicable to a wide range of language issues. The book is divided into six sections, each covering a key issue in Corpus Linguistics: lexis and grammar, lexicography, language teaching and terminology, translation, corpus building and sharing, and parsing and annotation. Together these sections present the reader with a broad picture of the field.

Advances in Empirical Translation Studies

Author : Meng Ji,Michael Oakes
Publisher : Cambridge University Press
Page : 285 pages
File Size : 47,8 Mb
Release : 2019-06-13
Category : Computers
ISBN : 9781108423274

Get Book

Advances in Empirical Translation Studies by Meng Ji,Michael Oakes Pdf

Introduces the integration of theoretical and applied translation studies for socially-oriented and data-driven empirical translation research.

Proceedings of the Fourth Workshop on Very Large Corpora

Author : Eva Ejerhed,Ido Dagan
Publisher : Unknown
Page : 188 pages
File Size : 52,5 Mb
Release : 1996
Category : Computational linguistics
ISBN : CORNELL:31924091021869

Get Book

Proceedings of the Fourth Workshop on Very Large Corpora by Eva Ejerhed,Ido Dagan Pdf

Building and Using Comparable Corpora

Author : Serge Sharoff,Reinhard Rapp,Pierre Zweigenbaum,Pascale Fung
Publisher : Springer Science & Business Media
Page : 335 pages
File Size : 51,6 Mb
Release : 2013-12-13
Category : Computers
ISBN : 9783642201288

Get Book

Building and Using Comparable Corpora by Serge Sharoff,Reinhard Rapp,Pierre Zweigenbaum,Pascale Fung Pdf

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Building and Using Comparable Corpora for Multilingual Natural Language Processing

Author : Serge Sharoff,Reinhard Rapp,Pierre Zweigenbaum
Publisher : Springer Nature
Page : 138 pages
File Size : 41,8 Mb
Release : 2023-08-23
Category : Computers
ISBN : 9783031313844

Get Book

Building and Using Comparable Corpora for Multilingual Natural Language Processing by Serge Sharoff,Reinhard Rapp,Pierre Zweigenbaum Pdf

This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.

Corpora: Pragmatics and Discourse

Author : Anonim
Publisher : BRILL
Page : 522 pages
File Size : 41,8 Mb
Release : 2015-06-29
Category : Language Arts & Disciplines
ISBN : 9789042029101

Get Book

Corpora: Pragmatics and Discourse by Anonim Pdf

This volume presents current state-of-the-art discussions in corpus-based linguistic research of the English language. The papers deal with Present-day English, worldwide varieties of English and the history of the English language. A special focus of the volume are studies in the broad field of corpus pragmatics and corpus-based discourse analysis. It includes corpus-based studies of speech acts, conversational routines, referential expressions and thought styles, as well as studies on the lexis, grammar and semantics of English. And it also includes several studies on technical aspects of corpus compilation, fieldwork and parsing.

Proceedings of the Sixth Workshop on Very Large Corpora

Author : Eugene Charniak
Publisher : Unknown
Page : 250 pages
File Size : 40,8 Mb
Release : 1998
Category : Computational linguistics
ISBN : CORNELL:31924091021877

Get Book

Proceedings of the Sixth Workshop on Very Large Corpora by Eugene Charniak Pdf