About

Jackson L. Lee

I'm a computational linguist. My research asks how children and algorithms discover the structure of language from the data they encounter. Chomsky (1957) argued that the structuralist project of mechanical discovery procedures was too ambitious. My work continues the empiricist tradition in computational linguistics that has reopened the question.

I hold a PhD in Linguistics from the University of Chicago, where I worked with John Goldsmith on computational morphology and phonology.

These days I lead a team of software and data engineers. The surface form is engineering; the underlying form, linguistics.

For more, please see my CV (PDF).

Research

My work is organized into three connected threads:

Morphological paradigms. How does a learner, child or algorithm, discover morphological paradigms from the fragmentary evidence they actually encounter? This thread runs from my dissertation on the computational structure of paradigms (paradigm alignment, inflection class clustering) through continuing work on unsupervised learning from acquisition data.

Segmentation. Modeling sub-word structure raises a deceptively fundamental question: what is a word? My work on segmentation, across stem extraction in morphological paradigms, truncation in Brazilian Portuguese, and morpheme and word segmentation in Cantonese, reveals wordhood as nebulous both theoretically and cross-linguistically.

Morphology–phonology interfaces. My earlier theoretical work on Cantonese tone and reduplication examined how morphology and phonology interact. My more recent computational work on paradigms and segmentation keeps surfacing interface questions (allomorphy, locality). Orthography and grapheme-to-phoneme correspondence now join morphology and phonology as a third layer.

Open-source infrastructure

I treat open-source software and data infrastructure as a first-class research output.

Publications

Charles Lam, Chaak-ming Lau, and Jackson L. Lee. To appear. Chinese language corpora. International Encyclopedia of Language and Linguistics. Edited by Hilary Nesi and Petar Milin. Elsevier. [ abstract ]

Charles Lam, Chaak-ming Lau, and Jackson L. Lee. 2024. Multi-Tiered Cantonese Word Segmentation. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 11993–12002. [ link | data | bib | abstract ]

Arya D. McCarthy, Jackson L. Lee, Alexandra DeLucia, Travis Bartley, Milind Agarwal, Lucas F.E. Ashby, Luca Del Signore, Cameron Gibson, Reuben Raff, Winston Wu. 2023. The SIGMORPHON 2022 Shared Task on Cross-lingual and Low-Resource Grapheme-to-Phoneme Conversion. Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 230–238. [ link | bib | abstract ]

Jackson L. Lee, Litong Chen, Charles Lam, Chaak Ming Lau, Tsz-Him Tsui. 2022. PyCantonese: Cantonese Linguistics and NLP in Python. Proceedings of the 13th Language Resources and Evaluation Conference, pp. 6607–6611. [ link | software documentation | bib | abstract ]

Jackson L. Lee, Lucas F. E. Ashby, M. Elizabeth Garza, Yeonju Lee-Sikka, Sean Miller, Alan Wong, Arya D. McCarthy, Kyle Gorman. 2020. Massively Multilingual Pronunciation Modeling with WikiPron. Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4223–4228. [ link | data and code | bib | abstract ]

Illustration for Shaping Phonology

Diane Brentari and Jackson L. Lee, editors. 2018. Shaping Phonology. University of Chicago Press. (Volume in honor of John Goldsmith). [ link | abstract ]

Mike Pham and Jackson L. Lee. 2018. Mincing words: balancing recovery and deletion in word truncation. Glossa: a journal of general linguistics 3(1): 36. [ link | data and code | bib | abstract ]

John A. Goldsmith, Jackson L. Lee, and Aris Xanthos. 2017. Computational learning of morphology. Annual Review of Linguistics 3, pp. 85–106. [ bib | abstract ]

Jackson L. Lee and John A. Goldsmith. 2016. Linguistica 5: Unsupervised Learning of Linguistic Structure. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 22–26. [ link | bib | abstract ]

Jackson L. Lee, Ross Burkholder, Gallagher B. Flinn, and Emily R. Coppess. 2016. Working with CHAT transcripts in Python. Technical Report TR-2016-02, Department of Computer Science, University of Chicago, January 2016. [ link | bib | abstract ]

Jackson L. Lee. 2015. Morphological Paradigms: Computational Structure and Unsupervised Learning. Proceedings of NAACL-HLT 2015 Student Research Workshop (SRW), Denver, Colorado, June 2015. Association for Computational Linguistics, pp. 161–167. [ link | bib | abstract ]

Andrea Beltrama and Jackson L. Lee. 2015. Great pizzas, ghost negations: The emergence and persistence of mixed expressives. Proceedings of Sinn und Bedeutung 19. 2015, pp. 143–160. [ bib | abstract ]

Jackson L. Lee and Stephen Matthews. 2015. When French becomes tonal: Prosodic transfer from L1 Cantonese and L2 English. The 6th Annual Proceedings of the Pronunciation in Second Language Learning and Teaching Conference, 2015, pp. 63–72. [ bib | abstract ]

Mike Pham and Jackson L. Lee. 2014. Combining successor and predecessor frequencies to model truncation in Brazilian Portuguese. Technical Report TR-2014-15, Department of Computer Science, University of Chicago, October 2014. [ link | bib | abstract ]

Jackson L. Lee. 2014. Automatic morphological alignment and clustering. Technical Report TR-2014-07, Department of Computer Science, University of Chicago, May 2014. [ link | bib | abstract ]

Alan C. L. Yu, Hyunjung Lee, and Jackson L. Lee. 2014. Variability in perceived duration: pitch dynamics and vowel quality. Proceedings of the 4th International Symposium on Tonal Aspects of Languages, May 2014, pp. 41–44. [ bib | abstract ]

Jackson L. Lee. 2014. The representation of contour tones in Cantonese. Proceedings of the 38th Annual Meeting of the Berkeley Linguistics Society. 2014, pp. 272–287. [ link | bib | abstract ]

Andrea Beltrama, Tasos Chatzikonstantinou, Jackson L. Lee, Mike Pham, and Diane Rak, editors. 2014. Proceedings of the Forty-eighth Annual Meeting of the Chicago Linguistic Society. Chicago Linguistic Society. [ table of contents | bib ]

Jackson L. Lee. 2012. Fixed-tone reduplication in Cantonese. McGill Working Papers in Linguistics 22(1). Proceedings from the Montreal-Ottawa-Toronto (MOT) Phonology Workshop 2011: Phonology in the 21st Century: In Honour of Glyne Piggott. 2012. [ bib | abstract ]