Yi Xu [i 'ʃu] is Professor of Speech Sciences at University College London, United Kingdom. He received his Master of Experimental Phonetics at the Institute of Linguistics, Chinese Academy of Social Sciences in 1984, and PhD in Linguistics from the University of Connecticut in 1993. He was a postdoctoral fellow at Massachusetts Institute of Technology in 1994-1995. He later served as an Assistant Professor at Northwestern University and as a researcher at the University of Chicago and Haskins Laboratories. He has been teaching at University College London since 2004. Yi Xu has published widely since 1986, covering topics on the production, perception and theoretical modeling of tone, intonation, segment and the syllable. He has also done work on auditory feedback in speech production, emotional prosody, short-term memory in reading and computational simulation of phonetic acquisition. His early work was focused on how lexical tones in Mandarin were produced and perceived in connected speech. From this work he developed the Target Approximation (TA) model (Xu & Wang, 2001). The TA model was then extended to intonation in the form of the Parallel Encoding and Target Approximation (PENTA) model (Xu, 2005). His most recent work has extended TA to speech production in general, using it to explain coarticulation, syllable (Xu, 2020) and vocal learning of articulation.
Duration in speech from an articulatory-functional perspective
Speech happens in time and duration is a critical aspect of timing in speech. There has been much research of various duration phenomena at different levels, yet there is a lack of a coherent account that that links them together. In this tutorial I explore duration from an articulatory-functional perspective that views speech as an information system that encodes multiple layers of communicative functions through an articulation process (Xu, 2009). From this perspective, duration is both intrinsic (Fowler, 1980), i.e., bound by articulatory constraints, and extrinsic (Turk & Shattuck-Hufnagel, 2020), i.e., functionally controlled to encode information.
Intrinsically, the demand to transmit information as efficiently as possible often pushes the speed limit of articulation, resulting in reduced intelligibility due to undershoot of phonetic targets (Xu & Prom-on, 2019). The need for articulatory clarity against this intrinsic constraint gives rise to a positive correlation between duration and importance of information (Xu, 2019). This correlation seems to also form a phonetic basis (Fowler & Housum, 1987; Wang et al., 2018) for the negative relation between word frequency and word length (Piantadosi et al., 2011). Extrinsically, duration is actively used as a dimension for encoding communicative functions, including, in particular, lexical contrast, grouping/boundary marking and emotional/functional expressions (Xu, 2019).
In addition to the abovementioned well-defined durational functions, there is also a weak tendency toward isochrony of syllables and prosodic phrases. But the tendency is language-dependent, based on data from English and Mandarin (Wang et al., 2018; Xu & Wang, 2009), and in a direction opposite of the widely known rhythm class hypothesis (Abercrombie, 1967).
This articulatory-functional view of duration may offer a coherent account that can link up many reported duration phenomena. But many further questions remain to be resolved by further research.
Abercrombie, D. (1967). Elements of general phonetics. Edinburgh: Edinburgh University Press.
Fowler, C. A. (1980). Coarticulation and theories of extrinsic timing. Journal of Phonetics 8: 113-133. (http://www.haskins.yale.edu/Reprints/HL0288.pdf).
Fowler, C. A. and Housum, J. (1987). Talkers' signaling of "new" and "old" words in speech and listeners' perception and use of the distinction. Journal of Memory and Language 26: 489-504. (https://apps.dtic.mil/dtic/tr/fulltext/u2/a192081.pdf#page=132).
Wang, B., Xu, Y. and Ding, Q. (2018). Interactive prosodic marking of focus, boundary and newness in Mandarin. Phonetica 75: 24-56. (http://www.homepages.ucl.ac.uk/~uclyyix/yispapers/Wang_Xu_Ding_Phonetica2018.pdf).
Wang, C., Zhang, J. and Xu, Y. (2018). Compressibility of segment duration in English and Chinese. In Proceedings of Speech Prosody 2018, Poznań, Poland: 651-655. (http://www.homepages.ucl.ac.uk/~uclyyix/yispapers/Wang_Zhang_Xu_SP2018.pdf).
Piantadosi, S. T., Tily, H. and Gibson, E. (2011). Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences 108(9): 3526-3529. (http://tedlab.mit.edu/tedlab_website/researchpapers/Piantadosi_et_al_2011_PNAS.pdf).
Turk, A. and Shattuck-Hufnagel, S. (2020). Timing Evidence for Symbolic Phonological Representations and Phonology-Extrinsic Timing in Speech Production. Frontiers in Psychology 10(2952). (https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02952/full).
Xu, Y. (2009). Timing and coordination in tone and intonation--An articulatory-functional perspective. Lingua 119(6): 906-927. (http://www.homepages.ucl.ac.uk/~uclyyix/yispapers/Xu_Lingua2009.pdf).
Xu, Y. (2019). Prosody, tone and intonation. In The Routledge Handbook of Phonetics. Chicago: Routledge. pp. 314-356. (http://www.homepages.ucl.ac.uk/~uclyyix/yispapers/Xu_Routledge_Handbook2019.pdf).
Xu, Y. (2020). Syllable is a synchronization mechanism that makes human speech possible. PsyArXiv doi:10.31234/osf.io/9v4hr (https://psyarxiv.com/9v4hr/).
Xu, Y. and Prom-on, S. (2019). Economy of Effort or Maximum Rate of Information? Exploring Basic Principles of Articulatory Dynamics. Frontiers in Psychology 10(2469). (https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02469/full).
Xu, Y. and Wang, Q. E. (2001). Pitch targets and their realization: Evidence from Mandarin Chinese. Speech Communication 33: 319-337. (http://www.homepages.ucl.ac.uk/~uclyyix/yispapers/Xu_Wang.pdf).