What is VOT? And a brief summary of Kwon (2014)
When I announced the 2013 ALS proceedings a couple of months ago, I asked for any suggestions on a follow-up post. Laxsnail got in touch and asked about Kwon’s article Acoustic observation for English speakers perception of a three-way laryngeal contrast of Korean stops. It involves understanding Voice Onset Time and f0 before you can understand the paper itself. So here we go with some definitions!
Voice Onset Time (VOT)
Voicing is a feature of some of the sounds we make. Hold you fingers lightly against the front of your throat and make the sound ssssssss, and then go zzzzzzzz - feel that buzz for the second one? That’s your vocal folds vibrating really quickly. There are lots of minimal pairs like this in English - s/z are fricatives, but there are also stops (AKA plosives), like t/d and p/b. For stops, the voice onset time (VOT) is the relationship between when you open your articulators (AKA mouth bits) and when those vocal folds (AKA voice box bits) start buzzing. Some stops will have the voicing start before the release of the closure, known as a negative VOT, aspirated consonants (a bit of air after the release) with a voiced sound after result in a positive VOT, and those situations where the voicing and opening occur at the same time are known as tenuis VOT just to sound fancy. The VOT of sounds varies across languages, which is a crucial feature of Kwon’s article.
Fundamental Frequency (f0)
Our speech apparatus is basically a very fancy way to create and manipulate acoustic signals. There are lots of different features of speech you can measure - perceptual phoneticians look at how we process these sounds, instrumental phoneticians looks at how we produce these sounds, and acoustic phoneticians look at the nature of these sounds. Acoustic phonetics really involves a lot of physics, maths and statistics, and everything I can explain about it I learnt from people much smarter than I am.
Speech signals are waveforms of sound, which is why they make those pretty patterns when you look at a visual representation of it. The fundamental frequency of speech is the lowest frequency of a periodic waveform, called the f0 because they count from zero. The lower the fundamental frequency the lower a person’s voice will sound to us. f0 is therefore a way of acoustically measuring what we might perceptually call pitch. Men generally have lower pitch than women, which is due to a number of factors including larynx size, and the length of your vocal folds.
OK! As English speakers, you may be familiar with the sounds represented by /p/ and /b/ - /p/ has a positive VOT, as it’s unvoiced, while /b/ has a tenuis VOT (this is a gross oversimplification, and English VOT is actually more complicated, but that’s not what today is about). Korean equivalents to p/b, t/d and k/g are not the same. Where we have 2 different sounds for each of those sets, Korean has three (this applies to all of them, but I’m going to stick with talking about the p/b bilabial set for simplicity).
Korean has an aspirated /pʰ/, which is a lot like English /p/ with that puff of air on release of the stop. They have two other bilabial plosives without any voicing either. These other two are distinguished by what is sometimes called a ‘tense/lax’ and other times a 'fortis/lenis’ distinction. This distinction is partly that the VOT for the lax (AKA lenis) are much longer than the tense (fortis), and it is partly that the fortis gives a higher f0 than the following vowel than the lenis. So, there’s a lot going on there!
Kwon hypothesised that an English speaker learning Korean would likely be able to separate out the /pʰ/ from the other two, but lump the fortis and lenis together and hear them more like an English /b/ - because we aren’t used to using this combo of VOT and f0 to distinguish stops. Kwon then created and ran an experiment with English speakers listening to these tricky Korean sounds. Her work supports the hypothesis, as English speakers were rubbish at telling the fortis/lenis pair apart, even though the VOT is quite distinct. Therefore, the f0 appears to be an important cue for telling them apart.
If you’re learning Korean and have been struggling to master certain words, this may be why!