SOAS Research Online

A Free Database of the Latest Research by SOAS Academics and PhD Students

[skip to content]

Wu, Mei-Shin, Schweikhard, Nathanael E., Bodt, Timotheus A., List, Johann-Mattis and Hill, Nathan (2020) 'Computer-Assisted Language Comparison: State of the Art.' Journal of Open Humanities Data, 6 (2). p. 2.

Text - Published Version
Available under License Creative Commons Attribution 4.0 (CC-BY 4.0).

Download (2MB) | Preview
[img] Text - Accepted Version
Restricted to Repository staff only

Request a copy


Historical language comparison opens windows onto a human past, long before the availability of written records. Since traditional language comparison within the framework of the comparative method is largely based on manual data comparison, requiring the meticulous sifting through dictionaries, word lists, and grammars, the framework is difficult to apply, especially in times where more and more data have become available in digital form. Unfortunately, it is not possible to simply automate the process of historical language comparison, not only because computational solutions lag behind human judgments in historical linguistics, but also because they lack the flexibility that would allow them to integrate various types of information from various kinds of sources. A more promising approach is to integrate computational and classical approaches within a computer-assisted framework, “neither completely computer-driven nor ignorant of the assistance computers afford” [1, p. 4]. In this paper, we will illustrate what we consider the current state of the art of computer-assisted language comparison by presenting a workflow that starts with raw data and leads up to a stage where sound correspondence patterns across multiple languages have been identified and can be readily presented, inspected, and discussed. We illustrate this workflow with the help of a newly prepared dataset on Hmong-Mien languages. Our illustration is accompanied by Python code and instructions on how to use additional web-based tools we developed so that users can apply our workflow for their own purposes.

Item Type: Journal Article
SOAS Departments & Centres: Departments and Subunits > Department of East Asian Languages & Cultures
ISSN: 2059481X
DOI (Digital Object Identifier):
Date Deposited: 22 Dec 2020 15:36
Funders: European Union, European Union, Other

Altmetric Data


Download activity - last 12 months
Downloads since deposit
Accesses by country - last 12 months
Accesses by referrer - last 12 months

Repository staff only

Edit Item Edit Item