Building A National Corpus

Building A National Corpus Book in PDF, ePub and Kindle version is available to download in english. Read online anytime anywhere directly from your device. Click on the download button below to get a free pdf file of Building A National Corpus book. This book definitely worth reading, it is an incredibly well-written.

Building a National Corpus

Author : Dawn Knight,Steve Morris,Laura Arman,Jennifer Needs,Mair Rees
Publisher : Springer Nature
Page : 192 pages
File Size : 53,5 Mb
Release : 2021-10-08
Category : Language Arts & Disciplines
ISBN : 9783030818586

Get Book

Building a National Corpus by Dawn Knight,Steve Morris,Laura Arman,Jennifer Needs,Mair Rees Pdf

This book aims to provide a micro-level, working model of a methodological approach and practical guidelines for building a corpus, informed by the work on the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes - the National Corpus of Contemporary Welsh). It focuses specifically on the development of detailed design frames for corpora across communicative modes (spoken, written and e-language), and the practical processes involved in the planning, collection, transcription, collation and (re)presentation of language data. The book is designed to be of significant value and relevance to those interested in critically engaging with corpus methodology. Although Welsh is the language under discussion, the processes and approaches discussed in the building of CorCenCC can be applied to a lesser or greater extent to other language contexts. This book provides a working model, and an account of how to build a corpus dataset from which step by step guidelines for creating other linguistic corpora in any language can be easily extrapolated. It will be of value to students and scholars of minority languages and corpus linguistics.

Overcoming Challenges in Corpus Construction

Author : Robbie Love
Publisher : Routledge
Page : 176 pages
File Size : 46,8 Mb
Release : 2020-01-06
Category : Language Arts & Disciplines
ISBN : 9780429771095

Get Book

Overcoming Challenges in Corpus Construction by Robbie Love Pdf

This volume offers a critical examination of the construction of the Spoken British National Corpus 2014 (Spoken BNC2014) and points the way forward toward a more informed understanding of corpus linguistic methodology more broadly. The book begins by situating the creation of this second corpus, a compilation of new, publicly-accessible Spoken British English from the 2010s, within the context of the first, created in 1994, talking through the need to balance backward capability and optimal practice for today’s users. Chapters subsequently use the Spoken BNC2014 as a focal point around which to discuss the various considerations taken into account in corpus construction, including design, data collection, transcription, and annotation. The volume concludes by reflecting on the successes and limitations of the project, as well as the broader utility of the corpus in linguistic research, both in current examples and future possibilities. This exciting new contribution to the literature on linguistic methodology is a valuable resource for students and researchers in corpus linguistics, applied linguistics, and English language teaching.

English Corpus Linguistics

Author : Charles F. Meyer
Publisher : Cambridge University Press
Page : 188 pages
File Size : 51,6 Mb
Release : 2002-06-13
Category : Computers
ISBN : 9780521808798

Get Book

English Corpus Linguistics by Charles F. Meyer Pdf

English Corpus Linguistics is a step-by-step guide to creating and analyzing linguistic corpora. It begins with a discussion of the role that corpus linguistics plays in linguistic theory, demonstrating that corpora have proven to be very useful resources for linguists who believe that their theories and descriptions of English should be based on real rather than contrived data. Charles F. Meyer goes on to describe how to plan the creation of a corpus, how to collect and computerize data for inclusion in a corpus, how to annotate the data that are collected, and how to conduct a corpus analysis of a completed corpus. The book concludes with an overview of the challenges that corpus linguists face to make both the creation and analysis of corpora much easier undertakings than they currently are. Clearly organized and accessibly written, this book will appeal to students of linguistics and English language.

Developing Linguistic Corpora

Author : Martin Wynne
Publisher : Oxbow Books Limited
Page : 100 pages
File Size : 41,5 Mb
Release : 2005
Category : Language Arts & Disciplines
ISBN : UVA:X004991162

Get Book

Developing Linguistic Corpora by Martin Wynne Pdf

A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.

Corpus Design and Construction in Minoritised Language Contexts - Cynllunio a Chreu Corpws mewn Cyd-destunau Ieithoedd Lleiafrifoledig

Author : Dawn Knight,Steve Morris,Tess Fitzpatrick
Publisher : Springer Nature
Page : 178 pages
File Size : 41,7 Mb
Release : 2021-07-05
Category : Language Arts & Disciplines
ISBN : 9783030724849

Get Book

Corpus Design and Construction in Minoritised Language Contexts - Cynllunio a Chreu Corpws mewn Cyd-destunau Ieithoedd Lleiafrifoledig by Dawn Knight,Steve Morris,Tess Fitzpatrick Pdf

This bilingual book provides a detailed overview of the project to construct a National Corpus of Contemporary Welsh (CorCenCC), addressing the conceptual and methodological challenges faced when developing language corpora for minoritised languages. A conceptual framework is presented for the user-driven design that underpinned the CorCenCC project, along with a detailed blueprint that can function as a scaffold for other researchers embarking on projects of this nature. This book will be of value to those working in language teaching, learning and assessment, language policy and planning, translation, corpus linguistics and language technology, and to anyone with an interest in Welsh and other minoritised languages. Mae'r llyfr dwyieithog hwn yn rhoi trosolwg manwl o'r prosiect i greu Corpws Cenedlaethol Cymraeg Cyfoes (CorCenCC), ac yn mynd i'r afael â'r heriau cysyniadol a methodolegol a wynebir wrth ddatblygu corpora iaith ar gyfer ieithoedd lleiafrifoledig. Cyflwynir fframwaith cysyniadol ar gyfer y cynllun wedi'i yrru gan ddefnyddwyr sy'n greiddiol i brosiect CorCenCC, ynghyd â glasbrint manwl a all weithredu fel sgaffald i ymchwilwyr eraill sy'n dechrau ar brosiectau o'r fath. Bydd y llyfr hwn o werth i'r rhai sy'n gweithio ym meysydd addysgu, dysgu ac asesu ieithoedd, polisi iaith a chynllunio ieithyddol, cyfieithu, ieithyddiaeth gorpws a thechnoleg iaith, ac unrhyw un â diddordeb yn y Gymraeg ac ieithoedd lleiafrifoledig eraill.

History, Features, and Typology of Language Corpora

Author : Niladri Sekhar Dash,S. Arulmozi
Publisher : Springer
Page : 293 pages
File Size : 48,6 Mb
Release : 2018-02-01
Category : Language Arts & Disciplines
ISBN : 9789811074585

Get Book

History, Features, and Typology of Language Corpora by Niladri Sekhar Dash,S. Arulmozi Pdf

This book discusses key issues of corpus linguistics like the definition of the corpus, primary features of a corpus, and utilization and limitations of corpora. It presents a unique classification scheme of language corpora to show how they can be studied from the perspective of genre, nature, text type, purpose, and application. A reference to parallel translation corpus is mandatory in the discussion of corpus generation, which the authors thoroughly address here, with a focus on Indian language corpora and English. Web-text corpus, a new development in corpus linguistics, is also discussed with elaborate reference to Indian web text corpora. The book also presents a short history of corpus generation and provides scenarios before and after the advent of computer-generated digital corpora. This book has several important features: it discusses many technical issues of the field in a lucid manner; contains extensive new diagrams and charts for easy comprehension; and presents discussions in simplified English to cater to the needs of non-native English readers. This is an important resource authored by academics who have many years of experience teaching and researching corpus linguistics. Its focus on Indian languages and on English corpora makes it applicable to students of graduate and postgraduate courses in applied linguistics, computational linguistics and language processing in South Asia and across countries where English is spoken as a first or second language.

Building and Exploring Web Corpora (WAC3 - 2007)

Author : Cédrick Fairon
Publisher : Presses univ. de Louvain
Page : 186 pages
File Size : 41,9 Mb
Release : 2007
Category : Language Arts & Disciplines
ISBN : 2874630829

Get Book

Building and Exploring Web Corpora (WAC3 - 2007) by Cédrick Fairon Pdf

WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworkshop (WAC) provides a venue for exploring how we can use it effectively and the advancementsto which this could lead.This book is a collection of the talks presented at the 3 rd WAC in Louvain-la-Neuve (Belgium).The focus is on the description of Web corpus collection projects, the exploration of Web datacharacteristics from a linguistics/NLP perspective, and on the use of crawled Web data for NLPpurposes. CLEANEVAL Any use of Web data requires that it be cleaned in order to get rid of unwanted material including,for example, HTML markup, navigation bars, advertisements. To date there has been no sharingof resources or expertise in this particular domain and the cleaning has often been done minimally.Cleaneval was an exercise aimed at promoting collaboration and improving our understandingof the issues. Results and perspectives are presented in this book.

The Babel Lexicon of Language

Author : Dan McIntyre,Lesley Jeffries,Matt Evans,Hazel Price,Erica Gold
Publisher : Cambridge University Press
Page : 313 pages
File Size : 45,9 Mb
Release : 2022-06-09
Category : Language Arts & Disciplines
ISBN : 9781108840453

Get Book

The Babel Lexicon of Language by Dan McIntyre,Lesley Jeffries,Matt Evans,Hazel Price,Erica Gold Pdf

An engaging and accessible A-Z of linguistics from the expert team behind Babel: The Language Magazine.

The Routledge Handbook of Corpus Linguistics

Author : Anne O'Keeffe,Michael McCarthy
Publisher : Routledge
Page : 711 pages
File Size : 41,6 Mb
Release : 2010-04-05
Category : Education
ISBN : 9781135153632

Get Book

The Routledge Handbook of Corpus Linguistics by Anne O'Keeffe,Michael McCarthy Pdf

Provides an overview of a dynamic and rapidly growing area with a widely applied methodology. This handbook covers the historical development of the field and its growing influence and application in other areas. It is suitable for advanced undergraduates and postgraduates.

Australian English Reimagined

Author : Louisa Willoughby,Howard Manns
Publisher : Routledge
Page : 243 pages
File Size : 53,7 Mb
Release : 2019-11-01
Category : Language Arts & Disciplines
ISBN : 9780429671111

Get Book

Australian English Reimagined by Louisa Willoughby,Howard Manns Pdf

Australian English is perhaps best known for its colourful slang, but the variety is much richer than slang alone. This collection provides a detailed account of Australian English by bringing together leading scholars of this English variety. These scholars provide a comprehensive overview of Australian English’s distinctive features and outline cutting-edge research into the variation and change of English in Australia. Organised thematically, this volume explores the ways in which Australian English differs from other varieties of English, as well as examining regional, social and stylistic variation within the variety. The volume first explores particular structural features where Australian English differentiates itself from other English varieties. There are chapters on phonetics and phonology, socio-phonetics, lexicon and discourse-pragmatics as these elements are core to understanding any variety of English, especially within the World Englishes paradigm. It then considers what are arguably the most salient aspects of variation within Australian English and finally focuses on historical, attitudinal and planning aspects of Australian English. This volume provides a thorough account of Australian English and its users as complex, diverse and worthy of study. Perhaps more importantly, this volume’s scholars provide a reimagining of Australian English and the paradigm through which future scholars may proceed.

Introducing Maltese Linguistics

Author : Bernard Comrie
Publisher : John Benjamins Publishing
Page : 439 pages
File Size : 44,5 Mb
Release : 2009
Category : Language Arts & Disciplines
ISBN : 9789027205803

Get Book

Introducing Maltese Linguistics by Bernard Comrie Pdf

Meltese Linguistics offers the general linguist a wide range if still largely unexplored areas of study. This collection of articles highlights a selection of on- going research projects in phonological, morphological and syntactic issues.

Building and Using Comparable Corpora for Multilingual Natural Language Processing

Author : Serge Sharoff,Reinhard Rapp,Pierre Zweigenbaum
Publisher : Springer Nature
Page : 138 pages
File Size : 53,6 Mb
Release : 2023-08-23
Category : Computers
ISBN : 9783031313844

Get Book

Building and Using Comparable Corpora for Multilingual Natural Language Processing by Serge Sharoff,Reinhard Rapp,Pierre Zweigenbaum Pdf

This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.

Building and Evaluating Domain Ontologies

Author : Gintarė Grigonytė
Publisher : Logos Verlag Berlin GmbH
Page : 213 pages
File Size : 54,8 Mb
Release : 2010
Category : Computers
ISBN : 9783832526573

Get Book

Building and Evaluating Domain Ontologies by Gintarė Grigonytė Pdf

An ontology is a knowledge representation structure made up of concepts and their interrelations. It represents shared understanding delineated by some domain. The building of an ontology can be addressed from the perspective of natural language processing. This thesis discusses the validity and theoretical background of knowledge acquisition from natural language. It also presents the theoretical and experimental framework for NLP-driven ontology building and evaluation tasks.

The Slovak Language in the Digital Age

Author : Georg Rehm,Hans Uszkoreit
Publisher : Springer Science & Business Media
Page : 89 pages
File Size : 41,7 Mb
Release : 2012-06-09
Category : Computers
ISBN : 9783642303708

Get Book

The Slovak Language in the Digital Age by Georg Rehm,Hans Uszkoreit Pdf

This white paper is part of a series that promotes knowledge about language technology and its potential. It addresses educators, journalists, politicians, language communities and others. The availability and use of language technology in Europe varies between languages. Consequently, the actions that are required to further support research and development of language technologies also differ for each language. The required actions depend on many factors, such as the complexity of a given language and the size of its community. META-NET, a Network of Excellence funded by the European Commission, has conducted an analysis of current language resources and technologies. This analysis focused on the 23 official European languages as well as other important national and regional languages in Europe. The results of this analysis suggest that there are many significant research gaps for each language. A more detailed expert analysis and assessment of the current situation will help maximise the impact of additional research and minimize any risks. META-NET consists of 54 research centres from 33 countries that are working with stakeholders from commercial businesses, government agencies, industry, research organisations, software companies, technology providers and European universities. Together, they are creating a common technology vision while developing a strategic research agenda that shows how language technology applications can address any research gaps by 2020.

Constructing Professional Discourse

Author : Concepción Orna-Montesinos
Publisher : Cambridge Scholars Publishing
Page : 250 pages
File Size : 53,5 Mb
Release : 2012-01-17
Category : Language Arts & Disciplines
ISBN : 9781443836999

Get Book

Constructing Professional Discourse by Concepción Orna-Montesinos Pdf

This book explores the fascinating role that language plays in the construction of non-verbal objects by mapping out the ontological meaning of the specialised concepts and the domain-specific knowledge embedded in them. In doing so, it provides a comprehensive linguistic insight into the discourse of professional domain-specific communities and hence, into the communication practices and procedures of those communities. In this respect, the book offers a response to the claims made by many of the most influential applied linguists today, such as Vijay Bhatia (1993, 2004), John Swales (1990, 2004) or Ken Hyland (2002), among others, who have consistently defended the need for applied linguistic research into the textual, generic and social perspectives on the under-researched interrelatedness of the discoursal and professional practices of a discipline. Specifically, this book provides readers with an integrative multi-perspective approach to the study of professional, domain-specific discourses. While it mainly draws on the tenets of genre theory and discourse semantics, it also nurtures from the theoretical and empirical foundations of applied linguistics, cognitive linguistics, corpus linguistics and ontological engineering. The book starts from the analysis of domain specific texts as final written products with specific lexico-grammatical, semantic and rhetorical features to later enquire into the written products as textual artefacts closely linked to the social context of production and interpretation of the text. This integrative approach provides fresh new insights into the way the processes of writing are affected by the community-specific, institutional and socio-historical circumstances in which domain-specific texts are produced.