The Unicode Cookbook For Linguists

The Unicode Cookbook For Linguists Book in PDF, ePub and Kindle version is available to download in english. Read online anytime anywhere directly from your device. Click on the download button below to get a free pdf file of The Unicode Cookbook For Linguists book. This book definitely worth reading, it is an incredibly well-written.

The Unicode cookbook for linguists

Author : Steven Moran,Michael Cysouw
Publisher : Language Science Press
Page : 145 pages
File Size : 53,7 Mb
Release : 2018-06-29
Category : Language Arts & Disciplines
ISBN : 9783961100903

Get Book

The Unicode cookbook for linguists by Steven Moran,Michael Cysouw Pdf

This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified and overcome these pitfalls involved in making writing systems and character encodings syntactically and semantically interoperable (to the extent that they can be), we created a suite of open-source Python and R tools to work with languages using orthography profiles that describe author- or document-specific orthographic conventions. In this cookbook we describe a formal specification of orthography profiles and provide recipes using open source tools to show how users can segment text, analyze it, identify errors, and to transform it into different written forms for comparative linguistics research. This book is a prime example of open publishing as envisioned by Language Science Press. It is open access, has accompanying open source software, has open peer review, versioning and so on. Read more in this blog post.

The Unicode Cookbook for Linguists

Author : Steven Moran,Michael Cysouw
Publisher : Saint Philip Street Press
Page : 144 pages
File Size : 43,8 Mb
Release : 2020-10-09
Category : Electronic
ISBN : 1013291824

Get Book

The Unicode Cookbook for Linguists by Steven Moran,Michael Cysouw Pdf

This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified and overcome these pitfalls involved in making writing systems and character encodings syntactically and semantically interoperable (to the extent that they can be), we created a suite of open-source Python and R tools to work with languages using orthography profiles that describe author- or document-specific orthographic conventions. In this cookbook we describe a formal specification of orthography profiles and provide recipes using open source tools to show how users can segment text, analyze it, identify errors, and to transform it into different written forms for comparative linguistics research. This work was published by Saint Philip Street Press pursuant to a Creative Commons license permitting commercial use. All rights not granted by the work's license are retained by the author or authors.

The Open Handbook of Linguistic Data Management

Author : Andrea L. Berez-Kroeker,Bradley McDonnell,Eve Koller,Lauren B. Collister
Publisher : MIT Press
Page : 687 pages
File Size : 42,7 Mb
Release : 2022-01-18
Category : Language Arts & Disciplines
ISBN : 9780262362177

Get Book

The Open Handbook of Linguistic Data Management by Andrea L. Berez-Kroeker,Bradley McDonnell,Eve Koller,Lauren B. Collister Pdf

A guide to principles and methods for the management, archiving, sharing, and citing of linguistic research data, especially digital data. "Doing language science" depends on collecting, transcribing, annotating, analyzing, storing, and sharing linguistic research data. This volume offers a guide to linguistic data management, engaging with current trends toward the transformation of linguistics into a more data-driven and reproducible scientific endeavor. It offers both principles and methods, presenting the conceptual foundations of linguistic data management and a series of case studies, each of which demonstrates a concrete application of abstract principles in a current practice. In part 1, contributors bring together knowledge from information science, archiving, and data stewardship relevant to linguistic data management. Topics covered include implementation principles, archiving data, finding and using datasets, and the valuation of time and effort involved in data management. Part 2 presents snapshots of practices across various subfields, with each chapter presenting a unique data management project with generalizable guidance for researchers. The Open Handbook of Linguistic Data Management is an essential addition to the toolkit of every linguist, guiding researchers toward making their data FAIR: Findable, Accessible, Interoperable, and Reusable.

Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences

Author : Antonio Pareja-Lora,Maria Blume,Barbara C. Lust,Christian Chiarcos
Publisher : MIT Press
Page : 273 pages
File Size : 51,8 Mb
Release : 2020-01-07
Category : Language Arts & Disciplines
ISBN : 9780262536257

Get Book

Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences by Antonio Pareja-Lora,Maria Blume,Barbara C. Lust,Christian Chiarcos Pdf

Making diverse data in linguistics and the language sciences open, distributed, and accessible: perspectives from language/language acquistiion researchers and technical LOD (linked open data) researchers. This volume examines the challenges inherent in making diverse data in linguistics and the language sciences open, distributed, integrated, and accessible, thus fostering wide data sharing and collaboration. It is unique in integrating the perspectives of language researchers and technical LOD (linked open data) researchers. Reporting on both active research needs in the field of language acquisition and technical advances in the development of data interoperability, the book demonstrates the advantages of an international infrastructure for scholarship in the field of language sciences. With contributions by researchers who produce complex data content and scholars involved in both the technology and the conceptual foundations of LLOD (linguistics linked open data), the book focuses on the area of language acquisition because it involves complex and diverse data sets, cross-linguistic analyses, and urgent collaborative research. The contributors discuss a variety of research methods, resources, and infrastructures. Contributors Isabelle Barrière, Nan Bernstein Ratner, Steven Bird, Maria Blume, Ted Caldwell, Christian Chiarcos, Cristina Dye, Suzanne Flynn, Claire Foley, Nancy Ide, Carissa Kang, D. Terence Langendoen, Barbara Lust, Brian MacWhinney, Jonathan Masci, Steven Moran, Antonio Pareja-Lora, Jim Reidy, Oya Y. Rieger, Gary F. Simons, Thorsten Trippel, Kara Warburton, Sue Ellen Wright, Claus Zinn

A short guide to post-editing

Author : Jean Nitzke,Silvia Hansen-Schirra
Publisher : Language Science Press
Page : 104 pages
File Size : 41,6 Mb
Release : 2024-06-03
Category : Language Arts & Disciplines
ISBN : 9783961103331

Get Book

A short guide to post-editing by Jean Nitzke,Silvia Hansen-Schirra Pdf

Artificial intelligence is changing and will continue to change the world we live in. These changes are also influencing the translation market. Machine translation (MT) systems automatically transfer one language to another within seconds. However, MT systems are very often still not capable of producing perfect translations. To achieve high quality translations, the MT output first has to be corrected by a professional translator. This procedure is called post-editing (PE). PE has become an established task on the professional translation market. The aim of this text book is to provide basic knowledge about the most relevant topics in professional PE. The text book comprises ten chapters on both theoretical and practical aspects including topics like MT approaches and development, guidelines, integration into CAT tools, risks in PE, data security, practical decisions in the PE process, competences for PE, and new job profiles.

Finite-State Text Processing

Author : Kyle Gorman,Richard Sproat
Publisher : Springer Nature
Page : 140 pages
File Size : 40,7 Mb
Release : 2022-06-01
Category : Computers
ISBN : 9783031021794

Get Book

Finite-State Text Processing by Kyle Gorman,Richard Sproat Pdf

Weighted finite-state transducers (WFSTs) are commonly used by engineers and computational linguists for processing and generating speech and text. This book first provides a detailed introduction to this formalism. It then introduces Pynini, a Python library for compiling finite-state grammars and for combining, optimizing, applying, and searching finite-state transducers. This book illustrates this library's conventions and use with a series of case studies. These include the compilation and application of context-dependent rewrite rules, the construction of morphological analyzers and generators, and text generation and processing applications.

Language technologies for a multilingual Europe

Author : Georg Rehm,Daniel Stein,Felix Sasaki,Andreas Witt
Publisher : Language Science Press
Page : 217 pages
File Size : 44,8 Mb
Release : 2018-06-19
Category : Language Arts & Disciplines
ISBN : 9783946234739

Get Book

Language technologies for a multilingual Europe by Georg Rehm,Daniel Stein,Felix Sasaki,Andreas Witt Pdf

This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu).

Semantic differences in translation

Author : Lore Vandevoorde
Publisher : Language Science Press
Page : 274 pages
File Size : 42,9 Mb
Release : 2024-06-03
Category : Language Arts & Disciplines
ISBN : 9783961100729

Get Book

Semantic differences in translation by Lore Vandevoorde Pdf

Although the notion of meaning has always been at the core of translation, the invariance of meaning has, partly due to practical constraints, rarely been challenged in Corpus-based Translation Studies. In answer to this, the aim of this book is to question the invariance of meaning in translated texts: if translation scholars agree on the fact that translated language is different from non-translated language with respect to a number of grammatical and lexical aspects, would it be possible to identify differences between translated and non-translated language on the semantic level too? More specifically, this books tries to formulate an answer to the following three questions: (i) how can semantic differences in translated vs non-translated language be investigated in a corpus-based study?, (ii) are there any differences on the semantic level between translated and non-translated language? and (iii) if there are differences on the semantic level, can we ascribe them to any of the (universal) tendencies of translation? In this book, I establish a way to visually explore semantic similarity on the basis of representations of translated and non-translated semantic fields. A technique for the comparison of semantic fields of translated and non-translated language called SMM++ (based on Helge Dyvik’s Semantic Mirrors method) is developed, yielding statistics-based visualizations of semantic fields. The SMM++ is presented via the case of inchoativity in Dutch (beginnen [to begin]). By comparing the visualizations of the semantic fields on different levels (translated Dutch with French as a source language, with English as a source language and non-translated Dutch) I further explore whether the differences between translated and non-translated fields of inchoativity in Dutch can be linked to any of the well-known universals of translation. The main results of this study are explained on the basis of two cognitively inspired frameworks: Halverson’s Gravitational Pull Hypothesis and Paradis’ neurolinguistic theory of bilingualism.

Problem solving activities in post-editing and translation from scratch

Author : Jean Nitzke
Publisher : Language Science Press
Page : 325 pages
File Size : 41,9 Mb
Release : 2024-06-03
Category : Electronic
ISBN : 9783961101313

Get Book

Problem solving activities in post-editing and translation from scratch by Jean Nitzke Pdf

Companies and organisations are increasingly using machine translation to improve efficiency and cost-effectiveness, and then edit the machine translated output to create a fluent text that adheres to given text conventions. This procedure is known as post-editing. Translation and post-editing can often be categorised as problem-solving activities. When the translation of a source text unit is not immediately obvious to the translator, or in other words, if there is a hurdle between the source item and the target item, the translation process can be considered problematic. Conversely, if there is no hurdle between the source and target texts, the translation process can be considered a task-solving activity and not a problem-solving activity. This study investigates whether machine translated output influences problem-solving effort in internet research, syntax, and other problem indicators and whether the effort can be linked to expertise. A total of 24 translators (twelve professionals and twelve semi-professionals) produced translations from scratch from English into German, and (monolingually) post-edited machine translation output for this study. The study is part of the CRITT TPR-DB database. The translation and (monolingual) post-editing sessions were recorded with an eye-tracker and a keylogging program. The participants were all given the same six texts (two texts per task). Different approaches were used to identify problematic translation units. First, internet research behaviour was considered as research is a distinct indicator of problematic translation units. Then, the focus was placed on syntactical structures in the MT output that do not adhere to the rules of the target language, as I assumed that they would cause problems in the (monolingual) post-editing tasks that would not occur in the translation from scratch task. Finally, problem indicators were identified via different parameters like Munit, which indicates how often the participants created and modified one translation unit, or the inefficiency (InEff) value of translation units, i.e. the number of produced and deleted tokens divided by the final length of the translation. Finally, the study highlights how these parameters can be used to identify problems in the translation process data using mere keylogging data.

Empirical studies in translation and discourse

Author : Mario Bisiada
Publisher : Language Science Press
Page : 260 pages
File Size : 53,7 Mb
Release : 2021
Category : Language Arts & Disciplines
ISBN : 9783961103003

Get Book

Empirical studies in translation and discourse by Mario Bisiada Pdf

The present volume seeks to contribute some studies to the subfield of Empirical Translation Studies and thus aid in extending its reach within the field of translation studies and thus in making our discipline more rigorous and fostering a reproducible research culture. The Translation in Transition conference series, across its editions in Copenhagen (2013), Germersheim (2015) and Ghent (2017), has been a major meeting point for scholars working with these aims in mind, and the conference in Barcelona (2019) has continued this tradition of expanding the sub-field of empirical translation studies to other paradigms within translation studies. This book is a collection of selected papers presented at that fourth Translation in Transition conference, held at the Universitat Pompeu Fabra in Barcelona on 19–20 September 2019.

Interpreting and technology

Author : Claudio Fantinuoli
Publisher : Language Science Press
Page : 159 pages
File Size : 52,7 Mb
Release : 2018-12-15
Category : Electronic
ISBN : 9783961101610

Get Book

Interpreting and technology by Claudio Fantinuoli Pdf

Unlike other professions, the impact of information and communication technology on interpreting has been moderate so far. However, recent advances in the areas of remote, computer-assisted, and, most recently, machine interpreting, are gaining the interest of both researchers and practitioners. This volume aims at exploring key issues, approaches and challenges to the interplay of interpreting and technology, an area that is still underrepresented in the field of Interpreting Studies. The contributions to this volume cover topics in the area of computer-assisted and remote interpreting, both in the conference as well as in the court setting, and report on experimental studies.

Translation, interpreting, cognition

Author : Tra&Co Group
Publisher : Language Science Press
Page : 204 pages
File Size : 40,9 Mb
Release : 2021
Category : Language Arts & Disciplines
ISBN : 9783961103041

Get Book

Translation, interpreting, cognition by Tra&Co Group Pdf

Cognitive aspects of the translation process have become central in Translation and Interpreting Studies in recent years, further establishing the field of Cognitive Translatology. Empirical and interdisciplinary studies investigating translation and interpreting processes promise a hitherto unprecedented predictive and explanatory power. This collection contains such studies which observe behaviour during translation and interpreting. The contributions cover a vast area and investigate behaviour during translation and interpreting – with a focus on training of future professionals, on language processing more generally, on the role of technology in the practice of translation and interpreting, on translation of multimodal media texts, on aspects of ergonomics and usability, on emotions, self-concept and psychological factors, and finally also on revision and post-editing. For the present publication, we selected a number of contributions presented at the Second International Congress on Translation, Interpreting and Cognition hosted by the Tra&Co Lab at the Johannes Gutenberg University of Mainz.

The Cambridge Handbook of Historical Orthography

Author : Marco Condorelli,Hanna Rutkowska
Publisher : Unknown
Page : 837 pages
File Size : 50,7 Mb
Release : 2023-10-12
Category : Language Arts & Disciplines
ISBN : 9781108487313

Get Book

The Cambridge Handbook of Historical Orthography by Marco Condorelli,Hanna Rutkowska Pdf

Written by a team of global scholars, this is the first Handbook covering the rapidly growing field of historical orthography. Comprehensive yet accessible, it is essential reading for academic researchers and students in the field, and in related areas such as morphology, syntax, historical linguistics, linguistic typology and sociolinguistics.

Mediated discourse at the European Parliament: Empirical investigations

Author : Marta Kajzer-Wietrzny,Adriano Ferraresi,Ilmari Ivaska,Silvia Bernardini
Publisher : Language Science Press
Page : 276 pages
File Size : 40,9 Mb
Release : 2022-10-20
Category : Language Arts & Disciplines
ISBN : 9783961103935

Get Book

Mediated discourse at the European Parliament: Empirical investigations by Marta Kajzer-Wietrzny,Adriano Ferraresi,Ilmari Ivaska,Silvia Bernardini Pdf

The purpose of this book is to showcase a diverse set of directions in empirical research on mediated discourse, reflecting on the state-of-the-art and the increasing intersection between Corpus-based Interpreting Studies (CBIS) and Corpus-based Translation Studies (CBTS). Undeniably, data from the European Parliament (EP) offer a great opportunity for such research. Not only does the institution provide a sizeable sample of oral debates held at the EP together with their simultaneous interpretations into all languages of the European Union. It also makes available written verbatim reports of the original speeches, which used to be translated. From a methodological perspective, EP materials thus guarantee a great degree of homogeneity, which is particularly valuable in corpus studies, where data comparability is frequently a challenge. In this volume, progress is visible in both CBIS and CBTS. In interpreting, it manifests itself notably in the availability of comprehensive transcription, annotation and alignment systems. In translation, datasets are becoming substantially richer in metadata, which allow for increasingly refined multi-factorial analysis. At the crossroads between the two fields, intermodal investigations bring to the fore what these mediation modes have in common and how they differ. The volume is thus aimed in particular at Interpreting and Translation scholars looking for new descriptive insights and methodological approaches in the investigation of mediated discourse, but it may be also of interest for (corpus) linguists analysing parliamentary discourse in general.

Machine translation for everyone: Empowering users in the age of artificial intelligence

Author : Dorothy Kenny
Publisher : Language Science Press
Page : 210 pages
File Size : 53,7 Mb
Release : 2022-07-06
Category : Computers
ISBN : 9783961103485

Get Book

Machine translation for everyone: Empowering users in the age of artificial intelligence by Dorothy Kenny Pdf

Language learning and translation have always been complementary pillars of multilingualism in the European Union. Both have been affected by the increasing availability of machine translation (MT): language learners now make use of free online MT to help them both understand and produce texts in a second language, but there are fears that uninformed use of the technology could undermine effective language learning. At the same time, MT is promoted as a technology that will change the face of professional translation, but the technical opacity of contemporary approaches, and the legal and ethical issues they raise, can make the participation of human translators in contemporary MT workflows particularly complicated. Against this background, this book attempts to promote teaching and learning about MT among a broad range of readers, including language learners, language teachers, trainee translators, translation teachers, and professional translators. It presents a rationale for learning about MT, and provides both a basic introduction to contemporary machine-learning based MT, and a more advanced discussion of neural MT. It explores the ethical issues that increased use of MT raises, and provides advice on its application in language learning. It also shows how users can make the most of MT through pre-editing, post-editing and customization of the technology.