SOAS Research Online

A Free Database of the Latest Research by SOAS Academics and PhD Students

[skip to content]

Baley, Julien (2022) 'Leveraging graph algorithms to speed up the annotation of large rhymed corpora.' Cahiers de Linguistique Asie Orientale, 51 (1). pp. 46-80.

Text - Published Version
Available under License Creative Commons Attribution 4.0 (CC-BY 4.0).

Download (2MB) | Preview


Abstract Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.

Item Type: Journal Article
Keywords: Linguistics and Language, Language and Linguistics
SOAS Departments & Centres: Departments and Subunits > Department of East Asian Languages & Cultures
ISSN: 19606028
DOI (Digital Object Identifier):
SWORD Depositor: JISC Publications Router
Date Deposited: 30 Mar 2022 13:24

Altmetric Data


Download activity - last 12 monthsShow export options
Downloads since deposit
6 month trend
6 month trend
Accesses by country - last 12 monthsShow export options
Accesses by referrer - last 12 monthsShow export options

Repository staff only

Edit Item Edit Item