Prof. Susan Hunston - Corpus Linguistics in 2017: a personal view

Plenary talk on “the development of Corpus Linguistics over the last couple of decades”. Hunston discusses the changes in the field and its five “turns”.

Why linguists need physics

In designing my lectures for the beginning of the semester, I realized it might not be clear to incoming students why they need to learn about wavelength and frequency and addition of waves to form complex waves. It’s not strictly necessary to study phonology, but without a basic background in the physics of sound, you end up being limited in your understanding of how the sounds are created (and which ones are even possible). Sure, there’s quite a lot of anatomy knowledge that contributes as well, but even with a fantastic understanding of the vocal tract’s shape and configuration, you’re still missing a major component.

So let’s back up a moment. The vocal tract.

Sound is generated by the vocal folds (marked with a black oval) when air pressure builds up underneath them and pushes against them so hard that they burst open. But since you’re pushing them together really hard, as soon as the pressure drops again, they snap back shut. This snapping action creates a noise, and when it happens very rapidly, it creates a buzzing sound.

Actually, this buzzing sound is very similar to the sound you make when you blow a raspberry. It’s the same mechanism, but with your lips pressed together instead of your vocal folds.

So where’s the physics? Well, we need to understand the shape of sound waves to understand why they sound the way they do. A simple wave (a sinusoidal wave) will sound like a simple, boring tone.

Originally posted by psychetronictonic

If you look at the wave created by blowing a raspberry, it looks really different.

The reason is that this buzz wave is composed of many, many sine waves, all overlapping and influencing each other. And it is physics that allows us to decompose the complex wave into something more interpretable.

Originally posted by televandalist

I’m not going to go into what a Fast Fourier Transform (FFT) is here, but it is basically the technique that figures out what all the simple waves are that are added together to get a complex wave.

The next place physics plays a big role is in the transformation of this buzzy sound from the vocal folds into a speech sound. This transformation happens when the buzzy wave passes through the long, complex tube of your throat, mouth, and nose.

Some components of the complex wave (certain frequencies) are amplified when they pass through an area of your vocal tract that is just the right length. You can think about this like two people swinging a jump rope. When they’re rotating their arms at the same pace, the jump rope goes around in big circles, like it’s supposed to. But if they’re out of sync, the rope collapses and turns into a slithery worm and makes it a very boring game. When a component of the wave is “in sync” with an area of the vocal tract, it gets louder, like the jump rope making full, big circles. But when a component is “out of sync”, it gets quieter, and doesn’t do much. This is how the vocal tract shapes our voice into speech.

Originally posted by diamants-bruts

But the question here is which components are amplified and which are damped (made quieter)? That is answerable by measuring the wavelengths of the components of the source noise and comparing them to the physical sizes of the spaces in the vocal tract. When a component wave fits snugly in a space (i.e., a whole number of wavelengths fit in the distance from one wall of the vocal tract to another), that is when you get amplification. Several of the regions of amplification (resonances) are useful in determining what vowel is being produced, for instance. And that is one way in which physics is useful when you’re studying speech sounds and linguistics.

Magic linguistics

If you have a native language with a very complex phonology, and you take polyjuice potion and switch bodies with someone with a native language with a very simple phonology, what would then happen with the speech?

If most information lies in an abstract system in the brain there would be no problems, right? The more marked sounds will flow smoothly.

But if the biomechanics in the anatomy of the mouth (yeah, tongue muscles) play a bigger role, it will be harder to pronounce the more marked sounds and you will get some kind of accent.


Where did the vowel space get its shape?

Quick answer: The jaw.

Slightly longer answer: The jaw is attached like a hinge, so it doesn’t drop straight down when you open your mouth. Moreover, your jaw is the part that’s moving, not the rest of your head. So, when your mouth opens, it’s like a door swinging open: it moves along a circular path relative to the immobile parts of your head.

Originally posted by gifsboom

But the vowel space isn’t rounded, it’s a trapezoid. This is due to the simplification of a complex shape to something easier to draw and conceptualize. Here’s a gif of how the tongue moves to create different mouth shapes that correspond to different vowels.

  • Close/High + Back = [u]
  • Mid + Back = [o] or [ɔ]
  • Open/Low + Back = [ɑ]
  • Open/Low + Front = [æ]
  • Mid + Front = [e] or [ɛ]
  • Close/High + Front = [i]

(Note: This isn’t showing all of the vowels of English, which is why I group some of the -High, -Low vowels together)

Part 2

How to find a topic for your linguistics paper

I get occasional questions asking me for good topics to write an end-of-semester linguistics paper on. The bad news is, I don’t have a list of ideas that I can just pass out to people, because that’s not really how finding a topic works. The good news is, here’s a series of questions that will help you find something to write about for any linguistics course. 

1. What parts of the course did you find most interesting? 

Even if your prof gives you free rein as to your topic, it’s generally supposed to have something to do with the course. So go back to the syllabus, flip through your notes and readings, and think about what parts of the course you enjoyed more than others. Or if your course didn’t get to cover all the chapters in your textbook, have a leaf through the remaining ones and see if any of them look interesting. Another option is to go interdisciplinary: is there another area that you’re interested in (e.g. psychology, music, gender, bilingualism, children especially if you have them already in your life) that you could combine with some aspect of the course? 

Make a shortlist for yourself of a couple options and have a quick google to see if any of them look like they have lots more information available or turn into dead ends. Some courses may be fine with you replicating something that’s already out there, just for the experience of figuring it out yourself, while some may be more keen on you working on something brand new. 

2. Where are you going to get data? 

Linguistics papers generally analyze some sort of data, so where is that data going to come from? Certain types of courses tend to involve certain types of data sources, so you could follow the trend of whatever types of data sources you’ve been discussing in class, or be more unconventional and figure out how to cross-apply a different one. Common sources of linguistic data include: 

Keep reading


Structural Ambiguity: sometimes a single sentence has more than one meaning.

The signs as linguists
  • Aries: semiotician
  • Taurus: computational linguist
  • Gemini: phonetician
  • Cancer: morphologist
  • Leo: historical linguist
  • Virgo: acquisitionist
  • Libra: syntactician
  • Scorpio: field linguist
  • Sagittarius: semanticist
  • Capricorn: psycholinguist
  • Aquarius: sociolinguist
  • Pisces: phonologist
An informal language experiment

I heard a random language observation somewhere on the Internet: Regional differences are emerging in the anglosphere for the term for “a small data storage device you plug into a computer’s USB port,” even though they’ve existed in the mainstream for about seven or eight years.

Here’s an informal, unscientific experiment: Write a comment on this post stating which term you usually use, and specify roughly where you live. Choose from the following:

  • Flash drive
  • Memory stick
  • Thumb drive
  • USB drive
  • Another term (specify)
  • I know what this is, but I don’t have a specific term for it

I’ll get the ball rolling: I say “flash drive,” and I’m from Southern California. What term do you use?