10 November 2022: Marieke Meelen (I): “On getting more out of historical data: transforming state-of-the-art NLP techniques to effective historical corpus-annotation tools” – Discovering Linguistics

In her lecture series, Marieke Meelen will dive into the challenges of diachronic linguistic research and how to address and overcome them. The first two lectures will be online through zoom and focus on methodology: how to annotate historical data and how to get more data to fill the gaps in transmission. In the third session, which will be in-person, we’ll look at how the presented methods can help us answer research questions about language variation and change.

Abstract

In this session we will focus on written historical data. We will go through the process of building well-annotated diachronic corpora from scratch, zooming in on specific challenges that historical linguists are dealing with. These challenges range from philological issues (anonymous manuscript witnesses with unknown dates), to orthographical (lack of systematic spelling and/or knowledge about letter-sound correspondences) and, last but not least, linguistic ones. After presenting different approaches to annotation, we’ll discuss advantages and disadvantages of each in light of the research questions we have that may focus on any subfield of linguistics (i.e. phonology, morphology, syntax, semantics, pragmatics, etc).

The lecture

View the slides for the lecture.

Mini-Bio

Marieke Meelen’s research interests include information structure, comparative syntax and historical linguistics. She is currently part of two AHRC-funded projects: the Emergence of Egophoricity (with Prof Hill at SOAS, University of London) and The History of Subject Pronouns (with Prof Willis at Oxford University and Prof Meier in Berlin). She is also the PI of an ELDP-funded research projects documenting endangered languages in Nepal.
She is interested in NLP and corpus creation for low-resource languages.

As part of her British Academy postdoctoral fellowship, she worked on the history of V2 word orders across Indo-European languages and developing a historical treebank of Welsh. Her doctoral thesis combined methods from computational and historical linguistics to reconstruct verb-initial and verb-second word order patterns and information structure in Welsh in their Celtic historical context. She is also a computational linguistic consultant for a project on the annotation of Middle Welsh texts at the Philipps-Universität in Marburg.

Marieke was awarded her PhD at Leiden University in 2016 supervised by Prof Lisa Cheng and Prof Alexander Lubotsky.

Lecture II by Marieke Meelen

Lecture III by Marieke Meelen