DFG Research Fellow, Centre des Recherches Linguistiques sur l'Asie Orientale, EHESS and Team ''Adaptation, Integration, Reticulation, Evolution'', UPMC, Paris
"Automatic Identification of Historically Related Words in Multiple Languages"
All languages constantly change. Words are lost when speakers cease to use them, new words are gained when new concepts evolve, and even the pronunciation of the words changes slightly over time. Slight modifications that can rarely be noticed during a person’s live time sum up to great changes in the system of a language over centuries. When the speakers of a language depart, their speech keeps on changing independently in the two communities, and at a certain point of time the independent changes are so great that they can no longer communicate with each other: what was one language has become two. Proving that two languages once were one is one of the major tasks of historical linguistics, a subdiscipline of linguistics that deals with the history (also called the evolution) of languages. Historical linguists employ evidence found in attested languages to reconstruct their unattested history. Among the most crucial types of evidence is the postulation of words which are historically related (going back to common ancestral forms).
In order to identify those words, linguistis apply a couple of procedures which are usually summarized under the term “comparative method”. These procedures are traditionally carried out manually. Linguists compare word lists from different languages, identify probably related words and set up lists of corresponding sound segments. This is a very tedious task, since the number of word pairs which could be compared grows drastically with the number of languages being investigated. Recently, automatic methods have been proposed to facilitate this task. Most of them are inspired by methods for automatic sequence comparison which were originally developed for applications in the field of evolutionary biology and information sciences. In the talk, I will give a detailed introduction into these methods and discuss their strong and weak points in accounting for the fundamental historical processes of lexical change.