Baley, Julien (2023) 'Evaluating Rhyme Annotations for Large Corpora: Metrics and Data.' Cahiers de Linguistique Asie Orientale, 52 (2). pp. 137-162.
|
Text
- Published Version
Available under License Creative Commons Attribution 4.0 (CC-BY 4.0). Download (1MB) | Preview |
Abstract
Recent methods have been proposed to produce automatic rhyme annotators for large rhymed corpora. These methods, such as Baley (2022b) greatly reduce the cost of annotating rhymed material, allowing historical linguists to focus on the analysis of the rhyme patterns. However, evidence for the quality of those annotations has been anecdotal, consisting of a handful of individual poem case studies. This paper proposes to address the issue: first, we discuss previously proposed metrics that evaluate the quality of an annotator’s output against a ground-truth annotation (List, Hill, and Foster (2019)) and we propose an alternative metric that is better suited to the task. Then, sampling from Baley’s published annotated corpus and re-annotating it by hand, we use the sample to demonstrate the lacunae in the original approach and show how to fix them. Finally, the hand-annotated sample and source code are published as additional data, so that other researchers can compare the performance of their own annotators.
Item Type: | Journal Article |
---|---|
Keywords: | data annotation; evaluation metric; Chinese rhymes; Middle Chinese phonology; annotation de données; métrique d’évaluation; rimes du chinois; phonologie du chinois moyen |
SOAS Departments & Centres: | Departments and Subunits > Department of East Asian Languages & Cultures |
ISSN: | 01533320 |
DOI (Digital Object Identifier): | https://doi.org/10.1163/19606028-bja10032 |
Date Deposited: | 30 May 2023 14:58 |
URI: | https://eprints.soas.ac.uk/id/eprint/39551 |
Funders: | Arts and Humanities Research Council |
Altmetric Data
Statistics
Accesses by country - last 12 months | Accesses by referrer - last 12 months |