The Statistics of Deduction, or, Sherlock Is a Computational Linguist
This isn’t a fandom blog by any means, but I like books and movies and TV as much as the next guy, and I’ve been kind of quietly excited about Season 3 of Sherlock that premiered today.
Anyway, while watching the live-stream, this frame caught my eye.
*perks up* Mary Morstan is a linguist?
Wait, how could Sherlock possibly know that just from looking at her?
He could because Sherlock is a computational linguist—or rather, Sherlock is something a computational linguist would create.
Much of the heavy lifting of computational linguistics these days revolves around classifiers: methods of identifying which of a set of categories a new observation belongs to by comparing the new observation to a set of observations whose categories are already known.
Given a category C with a set of features X, and an object O with a set of features X’, what is the probability that O belongs in the category C?
This is basically how any computer does deductive reasoning, and Sherlock Holmes just takes this trope and turns it up to eleven.
So how could he tell that Mary is a linguist just by looking at her?
To answer this, we must first ask the question: what does a linguist look like?
Contrary to popular opinion, most of us don’t walk around with “Kiss me, I’m a linguist” cards pasted to our foreheads, so that’s not it. We assume instead that there must be a set of subtle visual cues that someone like Sherlock can pick up on to determine someone’s occupation. So let’s throw out some dummy data. Let’s assume:
80% of female linguists wear dangly earrings
60% of female linguists wear faux-fur-trimmed coats
75% of female linguists have hair shorter than shoulder-length
(I don’t actually remember what she was wearing in the scene, but just pretend I’m right). Also:
1% of all women are linguists (they’re not, but these are really bad features, so the numbers aren’t realistic)
A category must always be defined in opposition to another category, so let’s see some more data:
10% of female butchers wear dangly earrings
50% of female butchers wear faux-fur-trimmed coats
95% of female butchers have hair shorter than shoulder-length
and 5% of all women are butchers (again, unrealistic numbers)
This is Bayes’ Theorem. In plain English it can be read as posterior = (prior x likelihood) / evidence. For our example here, it can be read as (put on your Sherlock voice):
The probability that Mary, who is wearing dangly earrings and a faux-fur-trimmed coat and has shorter-than-shoulder-length hair, is a linguist (as opposed to a butcher), is equal to the probability that any given woman is a linguist (instead of a butcher) times the probability that a women is a linguist (and not a butcher) if she has all three of those same traits divided by the probability that a woman has all three of those traits, regardless of occupation.
The evidence, or probability that a woman has all three relevant traits, has to be normalized over all categories, and is done as follows since we only have two categories here:
The catch is that you have to train these algorithms on HUGE amounts of data, on the order of billions of data points in order to build a decent generalist classifier with actually good discriminative features, but it’s been canonically established that Sherlock has a huge fact bank in his brain, so that’s not really a problem.
Basically, Sherlock apparently has some sweet classifier algorithms downloaded into his head, and that plus a enormous bank of facts allows for his “magical” deductive capability.