About

Jackson L. Lee

I'm a computational linguist. My research asks how sub-word structure (morphological paradigms, inflection classes, sound patterns, sound-symbol mappings) can be induced from the surface forms that learners actually encounter — whether those learners are children or algorithms.

I hold a PhD in Linguistics from the University of Chicago, where I worked with John Goldsmith on computational morphology and phonology. My current work extends it toward large-scale, cross-linguistic data infrastructure.

These days I lead a team of software and data engineers. The surface form is engineering; the underlying form, linguistics.

Research

My work is organized into three connected threads:

Morphological paradigms. How does a learner, child or algorithm, fill in a morphological paradigm from the fragmentary evidence they actually encounter? This thread runs from my dissertation on the computational structure of paradigms through continuing work on unsupervised learning from acquisition data.

Segmentation. Modeling sub-word structure raises a deceptively fundamental question: what is a word? My work on segmentation, across stem extraction in morphological paradigms, truncation in Brazilian Portuguese, and morpheme and word segmentation in Cantonese, reveals wordhood as nebulous both theoretically and cross-linguistically.

Morphology–phonology interfaces. My earlier theoretical work on Cantonese tone and reduplication examined how morphology and phonology interact. My more recent computational work on paradigms and segmentation keeps surfacing interface questions, now with orthography and grapheme-to-phoneme correspondence joining morphology and phonology as a third layer.

Open-source infrastructure

I treat open-source software and data infrastructure as a first-class research output.

Publications

2026

Chinese language corpora

Forthcoming

Charles Lam, Chaak-ming Lau, and Jackson L. Lee

International Encyclopedia of Language and Linguistics, 3rd edition. Reference Collection in Social Sciences.

2024

Multi-Tiered Cantonese Word Segmentation

Charles Lam, Chaak-ming Lau, and Jackson L. Lee

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

2023

The SIGMORPHON 2022 Shared Task on Cross-lingual and Low-Resource Grapheme-to-Phoneme Conversion

Arya D. McCarthy, Jackson L. Lee, Alexandra DeLucia, Travis Bartley, Milind Agarwal, Lucas F.E. Ashby, Luca Del Signore, Cameron Gibson, Reuben Raff, Winston Wu

Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology

2022

PyCantonese: Cantonese Linguistics and NLP in Python

Jackson L. Lee, Litong Chen, Charles Lam, Chaak Ming Lau, Tsz-Him Tsui

Proceedings of the 13th Language Resources and Evaluation Conference

2020

Massively Multilingual Pronunciation Modeling with WikiPron

Jackson L. Lee, Lucas F. E. Ashby, M. Elizabeth Garza, Yeonju Lee-Sikka, Sean Miller, Alan Wong, Arya D. McCarthy, Kyle Gorman

Proceedings of the 12th Language Resources and Evaluation Conference

2018

Illustration for Shaping Phonology

Shaping Phonology

Edited by Diane Brentari and Jackson L. Lee. University of Chicago Press. (This volume is in honor of John Goldsmith.)

On the discovery procedure

Jackson L. Lee

Shaping Phonology. Edited by Diane Brentari and Jackson L. Lee.

Mincing words: balancing recovery and deletion in word truncation

Mike Pham and Jackson L. Lee

Glossa

2017

Computational learning of morphology

John A. Goldsmith, Jackson L. Lee, and Aris Xanthos

Annual Review of Linguistics 3, 85-106

2016

Linguistica 5: Unsupervised Learning of Linguistic Structure

Jackson L. Lee and John A. Goldsmith

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics, San Diego, California, June 2016. Association for Computational Linguistics.

Working with CHAT transcripts in Python

Jackson L. Lee, Ross Burkholder, Gallagher B. Flinn, and Emily R. Coppess

Technical Report TR-2016-02, Department of Computer Science, University of Chicago, January 2016.

2015

Morphological Paradigms: Computational Structure and Unsupervised Learning

Jackson L. Lee

Proceedings of NAACL-HLT 2015 Student Research Workshop (SRW), pages 161–167, Denver, Colorado, June 2015. Association for Computational Linguistics.

Great pizzas, ghost negations: The emergence and persistence of mixed expressives

Andrea Beltrama and Jackson L. Lee

Proceedings of Sinn und Bedeutung 19. 2015.

When French becomes tonal: Prosodic transfer from L1 Cantonese and L2 English

Jackson L. Lee and Stephen Matthews

The 6th Annual Proceedings of the Pronunciation in Second Language Learning and Teaching Conference, 2015.

2014

Combining successor and predecessor frequencies to model truncation in Brazilian Portuguese

Mike Pham and Jackson L. Lee

Technical Report TR-2014-15, Department of Computer Science, University of Chicago, October 2014.

Automatic morphological alignment and clustering

Jackson L. Lee

Technical Report TR-2014-07, Department of Computer Science, University of Chicago, May 2014.

Variability in perceived duration: pitch dynamics and vowel quality

Alan C. L. Yu, Hyunjung Lee, and Jackson L. Lee

Proceedings of the 4th International Symposium on Tonal Aspects of Languages, May 2014.

The representation of contour tones in Cantonese

Jackson L. Lee

Proceedings of the 38th Annual Meeting of the Berkeley Linguistics Society. 2014.

Proceedings of the Forty-eighth Annual Meeting of the Chicago Linguistic Society

Andrea Beltrama, Tasos Chatzikonstantinou, Jackson L. Lee, Mike Pham, and Diane Rak, editors. Chicago Linguistic Society, 2014.

2012

Fixed-tone reduplication in Cantonese

Jackson L. Lee

McGill Working Papers in Linguistics 22(1). Proceedings from the Montreal-Ottawa-Toronto (MOT) Phonology Workshop 2011: Phonology in the 21st Century: In Honour of Glyne Piggott. 2012.